Population Genetics forum: topic

This is a public forum

Calculation Jost D

Penny Nelson

Thursday, 12 Mar 2009 23:43 UTC

I have calculated the D and ST for my data. GST = 0.09 while D = 0.2! This is quite a difference. The ST was 1.3 – does that mean that I only have 1.3 effective populations..? Is anyone else calculating these stats and can I use the D for pairwise population differentiation? I see that there is an online Jost D calculator at www.ngcrawford.com/django/jost/ which calculates pairwise population differentiation loci by loci.

Thanks

  • Replies

    Post a reply
    • Are Gst and D directly comparable? I guess I should read Jost’s paper again, but I thought they weren’t.

    • The interpretation of GST as a measure of relative differentiation between supopulations is one of the most surprising and easily disproven myths of population genetics (and not the only such myth). The problems with GST have long been known (see articles by Nagylaki, Chartlesworth, Hedrick, and others cited in my paper) but everyone inexplicably still uses GST for this purpose. This interpretation of GST rests on a mathematical mistake, the belief that heterozygosity can be additively partitioned into independent within- and between-group components. Heterozygosity is a non-additive measure. This is well-known in other fields but not in biology.

      If you do some numerical experiments it is easy to convince yourself that D rather than GST measures relative differentiation. Use your favorite software and make up some allele counts so that every population is genetically completely different (no allele is shared among subpopulations). Put in lots of more or less equally common alleles (the error of GST increases when within-group allele diversity is higher). Say, ten or twenty alleles in each subpopulations (like microsatellite data). Calculate GST (using alleles as the unit of analysis). You will find that GST is close to zero. You can make GST as close to zero as you want by increasing the diversity of these completely-differentiated subpopulations. My D on the other hand always will give exactly 1.0000 for subpopulations with no genetic overlap. (Note that some programs calculating D use estimators to correct for small-sample bias; this is usually the best approach when estimating D from sample data, but the actual exact equation for D based on true population frequencies may be more useful when trying to understand the meaning of D. A basic Excel worksheet for this is available at www.loujost.com). My D is based on a mathematically rigorous partitioning, unlike GST. Many conclusions in the literature (including the standard rule of thumb that 1 migrant per generation is enough to prevent important differentiation) are mistaken becuase they are based on GST as a measure of differentiation.

      For calculating D, you may also want to use a program by Anne Chao and students. This one gives confidence intervals for D. The program is called SPADE (http://chao.stat.nthu.edu.tw/softwareCE.html).

      By the way, this doesn’t mean that GST should be thrown in the trashbox. It is still an excellent tool for calculating migration rate. However, migration rate is not so closely connected to differentiation as people thought. It is very important to distinguish these two tasks. Lots of people focus on migration rate, not because they really care about migration rate but rather because they believe migration is a proxy for differentiation. See my paper for the equations that connect migration rate and real differentiation. Real differentiation depends approximately on the ratio of mutation rate to migration rate, rather than being independent of mutation rate as GST would suggest. A dependence on mutation rate actually makes good sense, and it is a consequence of the mathematics of the finite island model applied to D.

      In the next few weeks several articles about D vs GST will be coming out in Molecular Ecology. One is a meta-analysis of several years of published data using both GST and D. That article by Heller and Siegismund (subtitled “How wrong have we been?”) finds that conclusions in the real literature would often have been changed if D had been used instead of GST. Another article (by Ryman and Leimar) defends GST as a measure of differentiation, but their arguments are all really arguments defending GST’s usefulness as a tool for estimating migration rate. Numerical experiments of the kind I mentioned at the beginning of this post should convince anyone that GST cannot be used as a measure of relative differentiation of allele frequencies between subpopulations.

      I’ll be glad to help anyone trying to come to grips with this new math…

      Lou

    • How does D compare to Fst, as calculated by David Balding’s approach (for example)?

    • I am not familiar with his approach. Can you give me a citation? In general, no Gst-like or Fst-like measure is a measure of differentiation. All of them have a negative dependence on within-group heterozygosity, causing them to approach zero when diversity is high, even when demes are completely differentiated.

      Here is another fun numerical experiment if you have a simulation program for the finite island model (infinite alleles version). STart with two demes with large populations that are fixed for different alleles. Set migration rate to zero and input a high mutation rate. D at time t=0 will be unity since the demes are fixed for different alleles. Since there is no migration, and since we are using the infinite alleles model, both demes will increase their diversity through mutation, but both will remain completely differentiated (since the infinite allele model assumes each mutation produces a new novel allele). D correctly remains at 1.00, but GST or Fst drop over time to a value near zero.

    • I am not familiar with his approach. Can you give me a citation?

      He’s written about it in a few places, this is one:
      Beaumont MA; Balding DJ. (Apr 2004). Identifying adaptive genetic divergence among populations from genome scans. Mol Ecol. 13:969-980.

      In general, no Gst-like or Fst-like measure is a measure of differentiation.

      Even ones based on Malecot’s definition of Fst? Or the more recent definition based on coalescence time? As far as I can see, both will give Fst=1 in your example.

    • I took a quick look at the Beaumont and Balding paper just now, and it does not have a “new” definition of Fst, but rather applies a Bayesian statistical analysis to detect the existence of directional selelction. It doesn’t really address the question of measuring the magnitude of differentiation between demes. It focusses on p-values, not on actual magnitudes. This is a different question, and it has the problems that you discuss so well in your blog on statistical significance. Also, it restricts itself to the bi-allelic case. As I mentioned above, the errors of FST are most prominent when diversity is high (many alleles with nonvanishing frequencies). The bi-allelic case necessarily has low diversity.
      Rousset’s (2004) definition of Fst is based on Malecot’s equations, and this and all other similar definitions have the problem I mentioned. Try my example on your software and see for yourself. Let us know the results.

    • I was thinking that – given most papers include GST or some other similar measure – i should at least include the calculation for it – then cite for why it doesn’t work and include the D (Jost)as well. I am looking forward to the review -

      I would be grateful if you could clarify for me (maths dud!)

      The D outlined in equation 11 ranges between 0 and 1 and when it approaches 1 indicates total differentiation. So it has the same range and interpretation as GST (though the interpretation based on GST is erroneous especially when markers with high variabiltiy are used). [I love figure 2!…pictures at least make sense to me]. Is that right?

      The D can be calculated for the entire set of subpopulations giving an overall D and can also be calculated for each subpopulation pair giving a value of 1 when they are entirely differentiated.

      For diversity I should use the n of effective alleles in each subpop. because cumulatively they should reflect the total diversity within the pops. sampled.

      What is the relationship between differentiation and distance – I will re-read – Should Nei pairwise genetic distance give me the same rank in regards to likeness as pairwise differentiation?

      I will try it.

      Thanks – I wish there were more bumbling students on this forum…its a little embarrassing….

    • I wish there were more people on this forum, period! If we make it intersting, I imagine people will come.

      Here is another picture for you: imagine two demes that are fixed for different alleles. This could arise from a population crash followed by explosive repopulation from a small number of survivors. The two demes are completely differentiated. Both GST and D are 1. Suppose the migration rate is zero because of some barrier. Now, over time, mutations accumulate in the demes. These mutations are likely to be unique in each deme, so the demes remain completely differentiated (migration remains at zero). A valid measure of differentiation would be 1.00 for all times (at least until mutations started to repeat themselves across demes). A graph of GST and D versus time (using Malecot’s equations and the infinite allele model, with m=0) reveals a perfectly horizontal line for D, as a good differentiation measure should. However the graph of FST or GST falls sharply.

      Your description is right, and yes, D turns out to have a very close relationship with Nei’s genetic distance and the ranking should be the same.

    • Can anyone tell me how comparable G’ST (Hedrick 2005) and D are for standardizing GST values? I am currently working on a review/meta-analysis of GST vs G’ST.

    • Hey, I’ve updated my calculator to calculate Dest (Dr. Jost’s estimator) and Hedrick’s G’st for multilocus data sets. I’ve also just added bootstrapping functionality to determine variance/standard error. You can access it at:

      http://www.ngcrawford.com/django/jost/

      If you have questions or feature requests let me know.

      - Nick

    Post a reply

Search forums Advanced search

web feed

Submit this topic to

Advertisement