Linkage disequilibrium and Bonferroni
Paolo Franchini
Monday, 01 June 2009 14:02 UTC
Hi all,
I prepared a manuscript where I used the Bonferroni correction to see disequilibrium in my dataset. Since my data is quite big (12 microsatellite loci and 10 populations), the threshold of 0.05 was lowered to 0.00007 after the correction. The referees said me that in this way is very liberal the power to detect the disequilibrium and at that significance level may be virtually nonexistent. Which test can I use to investigate about the presence og linkage with my data? Or, if I can’t, how can discuss the linkage?
Thanks!!
Paolo
-
Replies
-
The Bonferroni correction is only a crude approximation to the actual p-value of your combined results. In fact no approximation is needed, because the p-value of your combined results can be calculated exactly. Find the one-tailed p-value for each of the loci, and recall that each p-value is (under the null hypothesis) a uniform random variable on the interval (0,1). Now, to calculate the exact p-value of the whole set of 12 loci, you need to find the probability of getting results as extreme or more extreme than your observed dataset. Your dataset has (under the null hypothesis) a probability of p1 x p2 x …x p12 = k. (Note p2 is the p-value of the second locus, etc.) So the question is, what is the probability of obtaining a set of p-values whose product is less than or equal to k. Because p-values are uniform random variables on the interval (0,1), this problem has a simple geometric interpretation. The space that describes your experiment has 12 dimensions, one for each locus, and the allowed values of p for each are on the interval (0,1), so your problem defines a 12-dimension unit cube (a “hypercube”). The question is, what proportion of the points in this hypercube have a p-value product as extreme or more extreme than the one you observed (=k). This defines a 12-dimensional hyperbola inside the 12-dimensional unit cube. The proportion of points lying below this hyperbola and inside the unit cube is just the volume of that space. This integral can be calculated exactly and the resulting simple formula is given on my website, www.loujost.com. (It is too hard to insert the formula in this forum).
This formula turns out to be equivalent to Fisher’s apparently quite different method for combining p-values, so you can cite him. He did not find my formula but if you use his method, obtain a t-value, and look up the p-value in a table, you get the same number as my formula. I haven’t had time to publish my formula so you’ll need to cite Fisher (or my website).
-
I have followed most of that but still struggle with some of the maths! Can you confirm what i stands for in your formula on your website? (I thought is was the number of p values you have – but that would be the same as n) Thanks for helping the biologist struggling with maths!
Natalie
-
Hi Natalie,
i is the index of the summation. It is not a single number. When someone writes a summation, just below the sigma sign there is usually something like “i=1”. This means the first term in the summation is what you get when you substitute “1” for “i” in the formula that follows the sigma. Then you put “2” into that formula and add the result to the the value you got when you put “1” in the formula. You keep going until i= the number above the sigma, which is n. Tht is the last term in the summation. If you don’t understand this, get some math friend to explain it by example. It is easier to see by example than by explaining, if you are unfamiliar with this concept. It is an essential concept.
-