Data Scientists group: topic

This is a public discussion board

Predictions of scientific findings, using all data available, for all to share.

Troy Sadkowsky

Sunday, 05 Jul 2009 03:15 UTC

I enjoyed sitting through a presentation by Sander Greenland at the 42nd SER meeting in Anaheim last week. I interpreted three main points from what he presented; Firstly, that causal diagrams should not be called “causal” unless they include all related variable; Secondly, it is difficult to include all related variables when performing statistical analysis (but some statistic methods are better suited than others at trying to include all related variables). Thirdly, all analyses should be transparent with all data sets and methods made publicly available.

So, can someone confidently say that A causes B? I don’t think so, if this was possible that person would know what the infinite future holds. There are too many future unknowns, changing variables and influencing factors which cause all confidence intervals to become meaningless. Can someone say A caused B? This removes the future aspect but it is still difficult to identify all related variables and it is now merely a perception of that person rather than a scientific finding. So, from Sander Greenland’s presentation, I got that it is better to state our scientific findings as predictions of the likelihood that they will happen again under those same conditions within a certain time period.

How difficult is it to include all related variables when performing statistical analyses? I believe that including all related variables is practically impossible (to reinforce this consider the butterfly-effect). It’s not practical for a researcher to measure all related variables let alone include them in statistical analysis. And attempts to group variables into a relevant context can introduce subjectivity. So, from Sander Greenland’s presentation, I got that researchers should use all data variables that can be obtained within the time and budget constraints and use statistical methods that use them all in analyses (e.g. Bayesian statistics).

What is the value of a study? A scientific study will always have its main hypothesis and obtaining evidence for or against the hypothesis should always be equal or greater than the cost involved in performing the study. However, there is more value to be derived from performing a scientific study beyond the value related to its hypothesis. Sander Greenland emphasized this point by stating that we should make public all raw data sets with comprehensive details on how the analysis was performed. This opens up the potential to vastly increase the value of the study through data reuse.

Updated 08 Jul 2009 02:53 UTC


Search groups Advanced search

web feed

Submit this topic to

Advertisement