BLAIR DNA Project
Interpreting DNA Test Results
Below is a presentation I gave at the Guild of One-Name Studies DNA Seminar, at Cheltenham UK, on February 20, 2010. I hope others will find it useful.
Almost the first question a Project Administrator gets from a participant is ďIíve just got my test results, what do they mean?Ē Most, who have run a DNA Project for any length of time, will probably agree that interpreting DNA test result is not always easy or straight forward, and usually gets more difficult as the size of a project grows.
I donít pretend to be an expert on the subject but Iíll do my best to explain the method I use in my Blair DNA Project.
Iím going to discuss the limitations of DNA testing, some of the things that DNA testing can do, how DNA test results are used and finally the methods I use to group participants.
DNA testing has little value on itís own, but when used with other conventional methods of tracing your ancestry, DNA becomes a valuable tool.
Itís important for both Project Administrator and participants to understand the limitations of DNA testing. There are a number of things that DNA testing can NOT do.
It can not tell you who your ancestors are. A personís test results on their own have virtually no value. You canít plug your DNA results into some magic formula and find out who your ancestors were. You need to compare your results with the test results of others to determine if you may somehow be connected.
It can not tell two participants with matching (or near matching) test results who their common ancestor is. Even with an exact match the results wonít tell you who your common ancestor is.
DNA testing can not even tell two participants with matching (or near matching) test results exactly how far back their common ancestor existed.
Finally it can not prove a suspected connection between two participants with a paper trail.
So what CAN DNA testing do?
If two participants closely match on their test results this is an indication that they share a common ancestor. Just how strong this connection is depends on the strength of the DNA match which Iíll discuss later in the presentation.
It can also give you a rough idea of how far back your common ancestor lived. This also depends on the strength of the DNA match plus the paper trails.
While DNA testing can not prove that suspected lines are connected, it can provide evidence to support that premise. How strong this evidence is depends on the paper trails and the strength of the DNA match.
Finally, DNA testing can prove that two individuals or suspected lines are NOT connected. No matter how good the paper trail may be, if there are too many DNA mutations, they can not be related.
Genetic Genealogy is somewhat of a dichotomy. On one hand we use the extremely accurate and precise science of DNA testing to get our test results and on the other hand we then take the test results and apply approximated mutation rates and probability to interpret the results.
One of the primary purposes of DNA testing is to determine if two participants share a common ancestor and hopefully, by using conventional research, determine who that common ancestor is.
The most common method to do this is to calculate the estimated Time to Most Recent Common Ancestor (TMRCA). In other words, estimate how many generations you have to go back to find the common ancestor of two participants.
To do this you have to use statistics and probabilities.
The actual calculations of the TMRCA are far too complex to discuss here, but they depend on knowing the number of mutations and the rate of mutation.
There are two common ways to count the number of mutations.
The first method, called the Infinite allele model, counts any change in a marker as a single mutation. Each marker is scored as either a match or a non-match. If a marker does not match it is assumed to be a single mutation.
The second method considers the actual difference between the values of markers that do not match. These differences are then added to give the genetic distance.
In the example shown, XXX mismatches YYY on 3 markers, but two of the markers have a difference in their value of 2 giving a genetic distance of 5.
The second factor in computing the Time to the Most Recent Common Ancestor is the mutation rate of the markers. The mutation rate has a marked effect on the calculation of TMRCA. Doubling the average mutation rate effectively cuts the TMRCA in half. For example a 36 for 37 match with an average mutation rate of 0.002 produces a 50% chance of sharing a common ancestor within 12 generations. The same 36 for 37 match with an average mutation rate of 0.004 produces a 50% chance of sharing a common ancestor within 6 generations.
One of the most important things to remember is that mutations occur at random. Thereís no way to predict which marker will mutate or when a marker will mutate.
The mutatation rates used to calculate the Time to Most Recent Common Ancestor are estimated based on past observations.
Despite the critical nature of mutation rates in calculating the TMRCA there is still no real consensus on either individual marker or average mutation rates. Various studies and companies have attempted to determine the mutation rates of individual markers as well as average mutation rates for groups of markers, but the rates seem to vary from study to study and from company to company. Even with all the DNA testing thatís been done, given the random nature of mutations, there is still not enough data to reach agreement on mutation rates.
FamityTree DNA has developed a program, called FTDNATip, that automatically calculates the TMRCA between two participants in a surname project. It uses individual marker mutation rates that FTDNA has developed. FTDNATip has been criticized by many for using mutation rates that are too high, thus producing results that are overly optimistic.
There are links to several sites with differing mutation rates at the bottom of this page.
This is a reproduction of an actual FTDNATip report of two participants in the Blair DNA Project. There are two things you should notice:
First, the two participants mismatch on 2 markers but they have a genetic distance of 3, meaning that they mismatch on one of their marker by two.
Second, the probabilities are stated as WITHIN a certain number of generations, i.e. a 68.67% probability of sharing a common ancestor WITHIN 8 generations.
TMRCA is a very broad estimate based on uncertain mutation rates. The probabilities will vary depending on the mutation rates used.
Itís imperative that you realize that the probabilities are WITHIN x number of generations. If the results state 89% within 12 generations it means thereís an 89% probability that your common ancestor existed sometime between generation 1 and generation 12. It does not mean there is an 89% probability your common ancestor was 12 generations ago.
The TMRCA results are based solely on the number of mutations and the mutation rates. It doesnít know your surname or the surname of the participant youíre comparing your results to.
The TMRCA results donít not know anything about your genealogies. It does NOT know that you paper trail goes back 3, 4, or 10 generations without a common ancestor.
All it knows is that you and your comparison have X mutations and you are using a mutation rate of Y.
For those that donít like dealing with statistics, FTDNA provides ďGuides for Interpreting Genetic Distance within Surname ProjectsĒ which are descriptive rather than mathematical. There are guides for 12, 25, 37, and 67 marker tests. Links to these guides are provided at the bottom of this page.
Placing your participants into various groups based on their DNA test results is one of the most important things a Project Administrator can do. I strongly recommend that you start doing this as soon as matches start to appear. As your project grows it becomes much easier to add new matches to an existing group or start a new group.
I group participants primarily based on the strength of their DNA match, but I also consider their paper trail to a lesser degree.
The Strength of the DNA match depends on 1) the number of markers tested, 2) the number of markers that match, and 3) the existence of rare or unusual marker values.
A 36 for 37 marker match with a rare value on one or more of the markers is a stronger match than a 24 for 25 marker match with common values on all the markers.
I normally only group participants who have tested at least 25 markers. I will include a participant with only 12 markers if he is an exact match and shares a known common ancestor with someone who is already included in a group.
As a rule of thumb I consider two participants to be a match if there is about a 50% chance they share a common ancestor within 12 generations. I use FTDNATip and normally consider 23 of 25, 33 of 37, and 61 of 67 a match close enough to include in a group.
As groups grow in size it becomes increasingly more difficult to calculate all the marker mismatches between members of a group. Fortunately there is an online utility that will do all the number crunching for you. McGeeís Y-DNA Comparison Utility allows you to copy data from a spreadsheet or other source, paste it into the program, and produce a chart like the one shown here. This particular chart has been somewhat enhanced but the McGee program gives you all the data in an almost identical format.
By creating this matrix you can see the exact number of mismatches between any two participants.
Note that in addition to the 7 actual participants, Iíve included a hypothetical Anc02.
One of the things I do for each of my groups is create an Ancestral Haplotype for that group. Heís known as Anc01 or Anc02, etc and is the hypothetical "common ancestor" of the participants in the Group. Although itís impossible to know his actual DNA results, it is possible to deduce his most likely test results based on the results of his descendants. In its simplest form the ancestral haplotype is simply the most frequent marker values of the participants in the group. This example illustrates the ancestral haplotype of 4 factious participants. Note that although each participant mismatches the other participants on 2 markers, they all match the hypothetical ancestor on 24 of 25 markers.
As you add more participants to a group it is possible that the ancestral haplotype will change. If a group contains a large number of participants who share a known common ancestor with a distinct marker value you may have to make adjustments so you do not skew the haplotype.
Sharing a rare value on one or more of your markers can be a strong indication that participants share a common ancestor, provided the rest of their DNA results support that conclusion. It can be especially valuable in the case of borderline groupings.
In Group 3 of the Blair DNA Project we have 17 participants with a value of 26 on DYS#390 which occurs only about 1 % of the time. 14 of these same participants also have values of 12/14 on DYS#385a/b which occurs less than 4 % of the time.
Several websites have developed frequency distributions for the various marker values. Iíve included the website address of the sites listed here at the bottom of the page. I used the Sorenson Molecular Genealogy Foundation Website.
One of the major reasons for DNA testing is to either support or refute conventional research. So using convention research to place someone in a DNA group may seem illogical. Conventional research should ONLY be used as a tie breaker.
No matter how good the paper trails may be, if the DNA results donít match, I wonít put the participants in the same group.
But what if the DNA results are inconclusive or borderline? Then I look at the conventional research. Do the participants claim to share a common ancestor? If so, how far back is this common ancestor? How complete are their paper trails? Are there any inconsistencies in their paper trails that would make them suspect? Whether I include them in the same group depends on the answers to all of these questions.
Sometimes instead of asking ďWhat is the probability that these two participants ARE related?Ē itís better to ask the question ďWhat is the probability that these two participants are NOT related?Ē
DNA References for
McGeeís Y-DNA Comparison Utility - http://www.mymcgee.com/tools/yutility.html
Sorenson Molecular Genealogy Foundation (SMGF) - Y-Chromosome Database http://www.smgf.org/pages/ydatabase.jspx
Marker Mutation Rates
WorldFamilies.net Marker & Mutation Comparison - http://www.worldfamilies.net/marker
Leo Little - Mutation Rate Effects - http://freepages.genealogy.rootsweb.ancestry.com/~geneticgenealogy/ratestuff.htm
Wikipedia - List of DYS markers - http://en.wikipedia.org/wiki/List_of_DYS_markers
Marker-to-DYS Conversion Chart with Mutation Rates - http://micbarnette.bravepages.com/dys_conversion_chart.html
Clan Donald USA TMRCA Calculator - http://dna-project.clan-donald-usa.org/tmrca.htm
TMRCA Calculator - http://www.dnacalculator.org/tmrcaCalculator.php
Moses Walker TMRCA Calculator - http://www.moseswalker.com/mrca/calculator.asp?q=2
FTDNA Interpreting Genetic Distance within Surname Projects
Frequency Distribution of Marker Values
Sorenson Molecular Genealogy Foundation (SMGF) - Y-Chromosome Marker Details - http://www.smgf.org/ychromosome/marker_details.jspx
Y-Base Statistics - http://www.ybase.org/statistics.asp
Leo Little data from FTDNA data and Y-search - http://freepages.genealogy.rootsweb.ancestry.com/~geneticgenealogy/yfreq.htm
This WebPage was last updated 01/17/2013
© January 1, 2010, blairdna.com
and its Allies