Do you need statistics to understand your data?

In almost every instance the sheer volume of raw data collected in an experiment means that it would be nigh on impossible to extract any meaning from it without using statistics. Last year, when we got our first big table of data to run through SPSS, I can remember staring blankly at the mass of numbers for some time in the hopes that it would magically tell me what to do or what it meant. In the 1999 film ‘The Matrix’ they can interpret 10 screens of green, falling, coded data at once just by looking at them; compared to that, how hard can a few hundred memory scores be, right? Well the answer is: very difficult! I’ve yet to come across anyone who could find the kind of information we need for research without using statistical tools.
What we are looking for in research are patterns or connections between variables, and to find these we need statistics. How do we tell if there is a difference between variables? We could draw a graph or chart. How do we know if the difference is significant and not due to chance? We can calculate some lovely p-values. Of course there is always the danger of mistaking the need for statistics as the need for a statistical program like SPSS. All SPSS really gives us is another long list of numbers, which on their own can be almost as unintelligible as the raw data itself. Without an understanding of the research in question and a good knowledge of the data gathered, statistical values are pretty much meaningless. There is no use in creating those graphs or calculating those p-values if we don’t know how they fit in with the rest of the data and what they mean for the research in general. So, while we need statistics to understand our data, we also need to understand the data itself.
Imagine the level of statistics needed for the Human Genome Project, which mapped the sequence of human DNA, given that estimates put the number of base-pairs of nucleotide in the human genome at over 3 billion. Also, given that some centres aiding the project were processing up to 100, 000 draft sequences every day for nearly a decade. Without statistics it would have been impossible for them to account for overlaps and the recurring sequences that make up more than 50% of human DNA. But, equally, without the scientists understanding and careful interpretation of the data, the sequencing could have been a shambles. There can be no doubt that while it is vital for research, you cannot fully understand your data with statistics alone.

(www.nature.com/nature/journal/v409/n6822/pdf/409860a0.pdf)
(http://en.wikipedia.org/wiki/Human_Genome_Project)

Advertisements
Leave a comment

4 Comments

  1. This was a really balanced blog, i haven’t read another blog where anyone else has placed so much importance on the data itself. This is certainly a good thing because ultimately the raw data is the the first and most basic set of results gained from an experiment, it is the essential numbers and not the added values like spearman’s rho that tell us actual worthwhile information. And like you said, if you screw up the data the whole statistical outcome will be inaccurate. Using the human genome project as an example of real life scientific work that requires incredibly brainy statisticians was well done too, another example i could think of was the work being done by CERN into particle accelerators and the famous Large Hadron Collider. Colliding minute particles must surely need very precise measurements and spot on calculations. According to Wikipedia (http://en.wikipedia.org/wiki/CERN) CERN employ around 8000 scientists, which is an awful lot of data output and certainly needs statsitical analysis.

    Reply
  2. I, like you, think that statistics are important to understanding our data. Provided we put the data into correctly and have performed our experiment with accuracy, it can help ensure our statistical analysis results are correct. From this we have a more in-depth idea of the results we have. Unfortunately though, I do not think that statistics show everything. SPSS does not recognise nor point out any outliers. These can radically change our results and if we based our understanding of our data purely on statistical analyses then we would not see important factors like this. So overall, I think statistics are the main factor in understanding our data but looking at raw data in it’s pure entity is also very important.

    Reply
  1. Homework for my TA – Week 3 | dy bannee diu
  2. Homework for my TA | dy bannee diu

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: