|
DNA 101 is an attempt to take the extremely
complex and confusing subject of Genetics and DNA and simplify
it into layman terms. This page addresses DNA only as it applies
to Y-Chromosome testing and genealogy. Technical terms are
defined in this same context.
This page is broken down into the following sections:
DNA
Chromosomes
The Y-Chromosome
Test the Y-Chromosome
Reading the Test Results
What Does it Mean
Putting It All Together
Definitions
Links
DNA

Deoxyribonucleic acid (DNA) is the chemical inside the nucleus
of all cells that carries the genetic instructions for making living organisms.
A DNA molecule consists of two strands that wrap around each other to resemble a
twisted ladder. The sides are made of sugar and phosphate molecules. The “rungs”
are made of nitrogen-containing chemicals called bases. Each strand is composed of
one sugar molecule, one phosphate molecule, and a base. Four different
bases
are present in DNA - adenine (A), thymine (T), cytosine (C), and guanine (G). The
particular order of the bases arranged along the sugar - phosphate backbone is
called the DNA sequence; the sequence specifies the exact genetic instructions
required to create a particular organism with its own unique traits.
Each strand of the DNA molecule is held together at its base by a
weak bond. The four bases pair in a set manner: Adenine (A) pairs with thymine
(T), while cytosine (C) pairs with guanine (G). These pairs of
bases are known as
Base Pairs (bp).
These Base Pairs (bp) are the basis of Y-chromosome testing.
Chromosomes
Chromosomes are paired threadlike "packages" of long segments of
DNA contained within the nucleus of each cell. In humans there are 23 pairs of
chromosomes. In 22 pairs, both members are essentially identical, one deriving from
the individual's mother, the other from the father. The 23rd pair is different. In
females this pair has two like chromosomes called "X".
In males it comprises one "X" and one "Y," two very dissimilar
chromosomes. It is these chromosome differences which determine sex.
The Y-Chromosome
Human sex is determined by the X and Y chromosomes. A female has 2
X-Chromosomes and a male has an X and a Y-Chromosome. When a child is conceived it
gets one chromosome from its mother and one chromosome from its father. The
chromosome from the mother will always be an X, but the chromosome from the father
may be either X or Y. If the child gets the X she will be a girl, if the child
gets the Y he will be a boy.
This Y-Chromosome has certain unique features:
-
Most of the Y-Chromosome is inherited as an integral
unit passed without alteration from father to sons, and to their sons, and so on,
unaffected by exchange or any other influence of the X-Chromosome that came
from the mother. It is the only nuclear chromosome that escapes the continual
reshuffling of parental genes during the process of sex cell production.
It is these unique features that make the Y-Chromosome useful
to genealogists.
Testing the Y-Chromosome
The Y-Chromosome has definable segments of DNA with known
genetic characteristics. These segments are known as Markers. These markers occur
at an identifiable physical location on a chromosome known as a
Locus. Each marker
is designated by a number (known as DYS#), according to international conventions.
You will often find the terms Marker and
Locus used interchangeably, but
technically the Marker is what is tested and the
Locus is where the marker is
located on the chromosome.
Although there are several types of markers used in DNA
studies, the Y-Chromosome test uses only one type. The marker used is called a
Short Tandem Repeat (STR). STRs are short sequences of DNA, (usually 2, 3, 4, or 5
base pairs long), that are repeated numerous times in a head-tail manner. The
16 base pair sequence of "gatagatagatagata" would represent 4
repeats of the sequence "gata". These repeats are referred to as
Allele. The
variation of the number of repeats of each marker enables discrimination between
individuals.
Reading the Test Results
The table below is a shorten version of the actual table used
to show our DNA test results. It shows 12 of the 25 markers that most of the
participants had tested.
The numbers (1-12) across the top of the table are the
marker
numbers. They have no significance other than as an easy way to refer to the
marker. Note: FamilyTree DNA refers to these numbers as Locus. The second set of
numbers across the top of the matrix are DYS# (the actual marker names).
The numbers down the left side of the table identify the
participant in the DNA project. The numbers down the right side of the table
identify the participant's oldest known ancestor.
The rest of the numbers are the Allele (the number repeats) for
each participant at the specified marker.
What Does it Mean
An individual's test results have little meaning on their own.
You cannot take these numbers, plug them into some formula and find out who your
ancestors are. The value of the test results depends on how your results
compare to other test results. And even when you match someone else, it will only
indicate that you and the person you match share a common ancestor. Depending on the
number of markers tested and the number of matches it will indicate with a certain
degree of probability how long ago this common ancestor existed. It will NOT show
exactly who this ancestor is.
As discussed above, the Y-Chromosome is passed from father to son.
The vast majority of the time the father passes an exact copy of his Y-Chromosome
to his son. This means that the markers of the son are identical to those of his
father. However on rare occasion there is a mutation or change in one of the
markers. The change is either an insertion or a deletion. An insertion is when an
additional repeat is added to a marker. A deletion is when one of the repeats is deleted.
Mutations occur at random. This means it is possible for two distant cousins
to match exactly on all markers while two brothers might not match exactly. Because of the
random nature of mutations we must use statistics and probability to estimate the
Time to the Most Recent Common Ancestor (TMRCA). The actual
calculations of TMRCA are mathematically complex and depend on knowing the rate of mutation
and the true number of mutations. At this time there is not enough data to accurately determine
either of these factors so certain assumptions have to be made. The discussion of these assumptions
and the actual calculations are beyond the scope of this webpage. For those wishing to read more about
the various models used, I recommend
Time to Most Recent Common Ancestry Calculator by Bruce Walsh. The simplest and one of the
most commonly used models makes the following assumptions:
-
Rate of Mutation = .002. This assumes that any given marker
has a .002 chance of mutating with each generation. In other words, we could
expect any marker to mutate once in 500 generations. The rate of .002 is
considered conservative and is the average of a number of studies. It will result in
a TMRCA that is longer than higher mutation rates.
Based on the above assumptions we derive the cumulative
probability table below. This table simply lists the number of generations
corresponding to the 50%, 90% and 95% probability levels for various numbers of matches.
This table tells us that if we match on 24 of 25 markers there
is a 50% probability that the most recent common ancestor is 17 generations or less,
a 90% probability that TMRCA is 40 generations or less, and a 95% probability
that TMRCA is 48 generations or less. The 95% Confidence Interval is the upper and
lower range of values that encompass 95% of the probability for the TMRCA. If we
match on 24 of 25 markers, 95% of the possible TMRCA values fall between 2 and
57 generations.
As you can see from the above table more markers reduce the number
of generations to TMRCA. The Chart below shows how increasing the number of markers
tested, decreases the number of generation to TMRCA when all markers match.
Putting It All Together
DNA testing can be a valuable tool in genealogical research
when it is combined with conventional research. Test results can be used to
confirm a suspected connection between two families or disprove a connection.
Although it is impossible to pinpoint a common ancestor from the test results
alone, with a proper paper trail you may be able to do so. My own experience with
DNA testing demonstrates this. I have been working with another individual to trace
his ancestry. He had traced his line back to his gr-gr grandfather born in Vermont
1823. My line goes back to 1700 Scotland, through Vermont. I have always thought our
lines were connected but there are holes that could not be filled and other
possible lines to consider. DNA test results showed an exact 25-marker match,
leaving virtually no doubt we shared a common ancestor. But the results alone
could not tell us who this ancestor was. It was the other information, collected by
conventional genealogical research, that allowed us to determine who our common
ancestor had to be.
Definitions
Allele: One of the variant forms of a gene at a particular
locus, or location, on a chromosome. Different alleles produce variation in
inherited characteristics. For STR markers, each allele is the number of
repeats of the short base sequence.
Base Pair: Two bases that form a "rung of the DNA ladder." A
DNA nucleotide is made of a molecule of sugar, a molecule of phosphoric acid, and
a molecule called a base. The bases are the "letters" that spell out the genetic
code. In DNA, the code letters are A, T, G, and C, which stand for the chemicals
adenine, thymine, guanine, and cytosine, respectively. In base pairing, adenine
always pairs with thymine, and guanine always pairs with cytosine.
Chromosome: One of the threadlike "packages" of genes and other
DNA in the nucleus of a cell.
DNA: The chemical inside the nucleus of a cell that carries
the genetic instructions for making living organisms.
DYS#: D=DNA, Y=Y chromosome, S=a unique DNA segment. A
label for genetic markers on the Y chromosome. Each marker is designated by a
number, according to international conventions. At present, virtually all the
DYS designations are given to STR markers (a class often used in genetic genealogy).
Gene: The functional and physical unit of heredity passed from
parent to offspring. Genes are pieces of DNA, and most genes contain the
information for making a specific protein.
Genome: All the DNA contained in an organism or a cell, which
includes both the chromosomes within the nucleus and the DNA in mitochondria.
Locus: A point in the genome, identified by a marker, which can
be mapped by some means. It does not necessarily correspond to a gene. A single
gene may have several loci within it (each defined by different markers) and these
markers may be separated in genetic or physical mapping experiments. In such
cases, it is useful to define these different loci, but normally the gene name
should be used to designate the gene itself, as this usually will convey the
most information.
Marker: Also known as a genetic marker, a segment of DNA with an
identifiable physical location on a chromosome whose inheritance can be
followed. A marker can be a gene, or it can be some section of DNA with no known
function. Because DNA segments that lie near each other on a chromosome tend to be
inherited together, markers are often used as indirect ways of tracking the
inheritance pattern of genes that have not yet been identified, but whose approximate
locations are known.
Microsatellite: Repetitive stretches of short sequences of
DNA used as genetic markers to track inheritance in families.
Mutation: A permanent structural alteration in DNA.
Short Tandem Repeats (STR): A genetic marker consisting of
multiple copies of an identical DNA sequence arranged in direct succession in
a particular region of a chromosome. Occasionally, one will mutate by the gain
or loss of one repeat. (Also known as microsatellite)
Links
International Society of Genetic Genealogy (ISOGG) - The first
society founded to promote the use of DNA testing in genealogy! With links to a
wealth of genetic genealogy tools and information.
Contexo.Info A website about the foundations of molecular genetics and biology.
An excellent site for those who are looking more details on DNA.
Time
to Most Recent Common Ancestry Calculator by Bruce Walsh.
The goal is to use genetic markers (here on the Y chromosome) to estimate the
TMRCA, the Time to the Most Recent Common Ancestor (MRCA), which is how many
generations the two Y chromosomes are from a common ancestor. This site explains the
various models used to determine TMRCA.
The National Human Genome Research Institute - The National
Human Genome Research Institute (NHGRI) created the Talking Glossary of Genetic
Terms to help people without scientific backgrounds understand the terms and
concepts used in genetic research.
Human Genome Project Information - The Human Genome Project
(HGP) is an international effort to discover all the approximately 30,000 to
35,000 human genes (the human genome), make them accessible for further
biological study, and determine the complete sequence of the 3 billion DNA
subunits (bases).
Primer on Molecular Genetics - This primer was prepared by
Denise Casey, Human Genome Management Information System, Oak Ridge National
Laboratory, for the 1991-92 DOE Human Genome Program Report.
Primer on Molecular Genetics (pdf format) - This is an adobe version of the primer above.
Why Y? The Y Chromosome in the Study of Human Evolution, Migration and Prehistory -
Neil Bradman and Mark Thomas of The Centre for Genetic Anthropology at University College London reveal
the power of modern genetic analysis for exploring the role of fathers in human history.
Genetics & Genealogy: Y Chromosome DNA and the Y Line - by Thomas
H. Roderick, PhD, Center for Human Genetics. A discussion of the Y-Chromosome
and its role in DNA as tool for genealogists.
Short Tandem Repeat DNA Internet DataBase - While the use of STRs
for genetic mapping and identity testing has become widespread among DNA typing
laboratories, there is no single place where information may be found regarding
STR systems. This web site is an attempt to bring together the abundant literature
on the subject in a cohesive fashion to make future work in this field easier.
Facts and sequence information on each STR system, population data, commonly used
multiplex STR systems, PCR primers and conditions, and a review of various
technologies for analysis of STR alleles have been included in this database.
GENEALOGY-DNA-L - This mailing list is for anyone with DNA (i.e., anyone!) who would
like to discuss methods and share results of DNA testing as applied to genealogical research.
Genetic Genealogy and Telephone Tag - A simplified explanation of how Y-DNA mutates.
To start your own Surname DNA Project click below

© November 1, 2002, blairgenealogy.com
All Rights Reserved
This page may not be copied, reproduced, or displayed without
written permission of the Blair DNA Project Coordinator.
Links to this page are authorized and welcomed
|