John Wilkins, who is known to lurk in he nether regions of NN, asked about a blog post about biodiversity and entropy. It looks like this is about to get me ranting…
-
-
Meaningless Numbers
- Date:
- Tuesday, 11 Nov ember 2008 - 20:28 UTC
John was asking about a couple of posts about measures of biodiversity. Biodiversity is problematic, because we only have a vague idea about what it means, and it’s difficult to pin down the precise meaning. In part this is because it has several concepts associated with it, a more diverse community could
- have more species,
- not be dominated by one or two species (i.e. be more even), or
- have species which are more different from each other (e.g. they are not all fruit flies).
Balancing these is going to be difficult. Ecologists have devised a bunch of measures of diversity; things like species richness, Shannon’s and Simpson’s indices and Fisher’s α. Of course, these all give different numbers, and for the same set of communities, they can give different rankings for diversity. What’s going on?
The problem, I’ll grandly claim, is that the measures have no biological basis. Fisher’s α, for example, is based on fitting a curve to the diversity, and the curve is chosen because it has convenient statistical properties1. The other indices are chosen for similar reasons: they have convenient mathematical interpretations, but nothing biological. No wonder, then, it’s difficult to understand what the numbers are telling us about the biology: they simply aren’t connected.
Now before it looks like I’m dumping on ecologists, I have to point out that other sciences do similar things. Economists specialise in producing indices intended to measure competitiveness, or the rate of turnover of virtual widgets. Some of these indices no doubt work well, but even things like inflation depend on the items that are put into the calculations. Followers of Mrs Thatcher’s government may well remember the way they kept on changing the definitions of unemployment: vagueness even appears in the definition of something as apparently certain as that. Closer to home, impact factors are statistics whose intentions are good, but I don’t think anyone trusts them. And p-values are just as bad: they say something statistical, but very little of scientific substance.
Why this plethora of meaningless statistics? As scientists, we want to quantify things; putting a number on a phenomenon is seen as a good thing. It gives us comfort that we can tame a problem through measurement: we have something solid that we can use in our discussions of the phenomenon. And often, I think, this is a good thing. The problems start when the numbers we use to measure do not match up with what we are measuring. If we want to know who are more intelligent: statisticians or sociologists, there is no point in measuring the probability that we would get the difference we observe if there was no difference in reality. We know there will be a difference, what really matters is how big it is, not if we’ve got the data to detect it.
The problem with biodiversity measures is slightly different. In this case, the statistics pretend to be measures of biodiversity, but alas they do little more. In their desire for numbers, ecologists have been seduced by mathematistry.
The effect of all this is poor science: we are producing numbers with little meaning, and then trying to make inferences or decisions based on them. This can’t produce good science, except by chance2.
So, what are we to do? The first step is to admit we have a problem. Boys and girls, don’t use statistics unless you understand what they measure. If you don’t understand it, just say “no”. If you do understand the number, but it still doesn’t make any sense, put it down and calmly walk away. Or ask around and see if anyone else can explain what the number means. And don’t be fobbed off with a vague “well, it measures biodiversity”. Especially if it’s a p-value.
1 For those who care, it’s the limit of the negative binomial distribution, as the shape goes to zero.
2 Actually, things may not be quite this bad: the numbers will have some correlation with what we want to measure, so we are more likely to make a good decision because the measures are closer to the Truth than if they were totally random. But I promised a rant, so I’m going to relegate any nuance to the footnotes.
Last updated: Tuesday, 11 Nov 2008 - 20:28 UTC
-
Comments
-
“Mathematistry” – like it! I thought at first you had made it up but got 125 hits in Google.
But nothing about entropy? I was looking forward to that bit…
Phylogenetic uniqueness, Bob, you know it makes sense.
Vane-Wright, R.I., C.J. Humphries and P.H. Williams. 1991. What to
protect? Systematics and the agony of choice. Biol. Conserv. 55, 235-254.
and lots more where that came from.