The Grantner Family

Haplotypes, Haplogroups, Haplotrees and DNA Matches


Haplotypes

Y-DNA Haplotypes are determined by STRs (Short Tandem Repeats). An individual's Y-DNA 12-marker haplotype would be represented as:

STRs are sequences of the nucleotides Adenine (A), Guanine (G), Cytosine (C) and Thymine (T). Each DYS Marker (DYS#) has a known sequence repeat motif and a known number of repeats (which vary between individuals). The Alleles, in the above haplotype, are the number of repeats for the associated DYS#. DYS393, for example, has a motif of AGAT and the number of repeats is 9 to 17. The Alleles 13 fits within this range.

The relative mutation rate for a STR is extremely slow (once every 250-500 generations), so most family members will have the same values. But they change enough so that people who do not share a common ancestor in their male line in the last several thousand years will have different values..

It is this property of relatively slow changes that makes the comparison of Y-DNA haplotypes useful in determining that two individuals have a MRCA (Most Recent Common Ancestor) in relatively recent history. There is more about haplotype matching, and the implications, below.

In addition to STR (Short Tandem Repeat) mutations, there are also SNP (Single Nucleotide Polymorphism) mutations. SNP mutations occur at a much lower rate. The next section will talk more of SNPs; suffice it now to say that Y-DNA SNPs determine Y-DNA haplogroups and mtDNA SNPs determine both mtDNA haplotypes and mtDNA haplogroups.

Unlike Y-DNA, mtDNA does not contain STR markers. The commerical mtDNA test is basically a giant multi-nucleotide SNP test checking 1050 nucleotide locations if both HVR1 (540 nucleotides) and HVR2 (510 nucleotides) are tested. Your actual mtDNA haplotype would be all 1050 alleles (DNA letters, i.e., A, T, G, or C for each nucleotide location). But the standard convention for reporting the results of mtDNA test is to only report the mutations/differences as compared to the mtDNA Cambridge Reference Sequence (CRS), which was the first mtDNA molecule fully sequenced. Therefore, a typical mtDNA haplotype would be shown as:

The above entry 16189C indicates that the individual has Cytosine (C) at location 16189 rather than Thymine (T), which the CRS has. The complete CRS can be viewed here.

SNP mutations occur at a much lower rate than STR mutations. The use of mtDNA haplotypes for genealogy is therefore limited due to the slow mutation rates involved, resulting in a lack of resolution.

Haplogroups and Haplotrees

The Y chromosome contains two types of ancestral markers. Short Tandem Repeats (STRs) are the basis for a haplotype and trace recent ancestry. The second type of ancestral marker, Single Nucleotide Polymorphisms (SNPs), document ancient ancestry. SNPs are small "mistakes" that occur in DNA and are passed on to future generations. SNP mutations are rare. They happen at a rate of approximately one mutation every few hundred generations.

A SNP is a change (when a single nucleotide (A, T, G, or C) in the genome sequence is altered) in the Y-DNA sequence. SNPS are unique to specific haplogroups so SNP tests such as the Backbone and Deep SNP tests are used to identify haplogroups and their subclades respectively. Haplogroups, defined by SNP's are the branches of the tree of mankind. SNPs are named with a letter code and a number. The letter indicates the lab or research team that discovered the SNP. The number indicates the order in which it was discovered. For example M173 is the 173rd SNP documented by the Human Population Genetics Laboratory at Stanford University, which uses the letter M.

A Y-DNA haplogroup is defined as all of the male descendants of the single person who first showed a particular SNP mutation. A SNP mutation identifies a group who share a common ancestor far back in time, since SNPs rarely mutate. Each member of a particular haplogroup has the same SNP mutation.

One way to think about haplogroups is as major branches on the family tree of Homo Sapiens. These haplogroup branches characterize the early migrations of population groups. As a result, haplogroups are usually associated with a geographic region. If haplogroups are the branches of the tree then the haplotypes represent the leaves of the tree. All of the haplotypes that belong to a particular haplogroup are leaves on the same branch. Both mtDNA and Y-DNA tests provide haplogroup information, but remember that the haplogroups nomenclature are different for each.

Haplogroups are progressive SNP mutations from Y-Chromosomal Adam as shown in the following chart from the Family Tree DNA (FTDNA) site:

The above tree also shows the defining SNP for each haplogroup.

As noted above, the haplogroup branches characterize the early migrations of population groups. A Y-DNA migration map can been seen here.

The SNPs revealed by mtDNA testing are used to define mtDNA Haplogroups. Even though these haplogroups may use the same letter designations as for Y-DNA haplogroups, it is important to realize that they are not connected. A map of the mtDNA haplogroups is:

Haplotype Matches

Y-DNA Matches

An exact or close match in the Y-DNA haplotypes of two individuals provides information about their MRCA (Most Recent Common Ancestor).

For genealogists, if you match another person exactly, you have a 99.9% likelihood of sharing a common ancestor with that person. This individual is described scientifically as the Most Recent Common Ancestor (MRCA). Population geneticists then apply a term known as the Most Likely Estimate (MLE) of the Time when your MRCA would have lived (TMRCA). However, that is only an estimate and in each individual case the actual generation could be nearer or further from the person tested. For the purposes of scientific discussion, our population geneticist feels that 25 years best expresses a typical generation prior to the Dark Ages and 25 to 30 years per generation for the period thereafter.

Since we are all related to one another if we go back far enough in time, it is important to only consider very close matches when we are using DNA to resolve genealogical questions. Assuming that the desire is to find related individuals through DNA testing, it is probably obvious that the more exact the match, the greater the probability of a "recent" MRCA. In addition, adding additional markers (i.e., a 37-marker test versus a 12-marker test) increases the precision of the test. Finally, the surname comes into play. A perfect match between two individuals with the same surname is much more significant than a perfect match between two individuals with different surnames.

Just as there are surnames which are very common, (such as Smith and Jones), and surnames which are uncommon, there are Haplotypes (a set of results that characterize you on the Y-Chromosome) with a high frequency of occurrence (aka common), and Haplotypes with a low frequency of occurrence (aka uncommon). The 12 Marker result from the Y-chromosome test is called a Haplotype, and can help determine if your DNA sample is common or uncommon.

When you compare a 12 Marker result to another 12 marker result of someone with the same surname, and the results match 12/12, there is a 99% probability that you two are related within the time frame included in the MRCA table below. If the match is 11/12, there's still a high probability that you are related ... if the 11/12 match is within the same surname. If you compare a 25 Marker result to another 25 marker result for the same surname, and the results match 25/25, then there is also a 99% confidence that the two individuals are related…and at a much closer time interval than with the 12 marker test.

If, on the other hand, you compare the 12 marker result to someone else who does not have the same surname, but the scores match, you are most likely not recently related. When we use the term recently related, we are talking about a time frame within the last 1000 years or 40 generations, a time depth that accommodates the earliest known use of surnames.

Though it doesn't seem to highlighted in the literature, it seems to me that the uniqueness of the surname is also relevant. It makes sense that a perfect match between two individuals with the surname Smith would not be as conclusive as a perfect match between two individuals with the surname McZyglagraf.

According to current theories, we are all related. The degree of relatedness depends on the time frame, or the number of generations between the participants and the common ancestor.

Since we all are descended from one person, and then from a few families, and as times goes by those families keep branching out up to the point where we get to our own family nest, it would be natural that when we check our DNA, the less markers we check, the less unique they are, and the more markers we test, the more unique the whole string of markers is. In other words, to go to extremes, if we tested only one marker, we would most certainly match with millions of individuals that shared that marker for thousands of years. But if, on the other hand, we test many markers, we will match very very few people that share those same markers. Those would be the ones that are closely related to us.

This is valid when checking our matches on 12, 25 or 37 markers. The likelihood that we will match other individuals with 12 markers is far greater than matching on 25 or 37. Especially if our family descends from a populational group that came from one or a few prolific families thousands of years ago (which is the case for Western Europe). The total population of Europe was 60,000 people at the end of the last Ice Age, about 10,000 years ago. Now Europe has a population of 300 million people. This increase is almost entirely due to a natural increase in population rather then immigration from other continents. Keeping this in mind, it is reasonable that many people alive today in Europe will match with other Europeans from before the time that our ancestors began the adoption of surnames, and when you match someone who has a different surname your first thought should be that the ‘connection’ is distant rather then recent.

Our bodies work as copy machines when it comes to our Y-DNA. You can have a copy machine doing 1,000 copies without a problem, and then the 1,001 copy may have an "o" that looks more like an "e". And when we use this copy to make additional ones, all the new ones will now have an "e" instead of an "o". This is a simple way to explain how mutations occur in our Y-DNA when it's transferred (copied) from father to son. Mutations don't happen frequently. On the contrary, they happen very seldom, but they can happen randomly in time, which means that I could be one mutation off from my father. That is why all those matches, or close matches, on 12 markers will in most of the cases go away when they happen between different surnames. If we increased the number of markers that are compared, more mutations will show up, which means the time when the common ancestor lived is way back in time.

Many surnames are much older than a few hundred years and two people may share a surname but only match 24/25 or even 23/25. In these
cases, as the graph shows, the MLE of when their MRCA lived could be much further back in time. That is, you are related, but probably much more distantly.

As we increase the number of markers tested, there is a decreasing rate of improvement. For most cases, the ideal number seems to be 37 markers. If more precision is required or desired, an upgrade to 67 markers would be appropriate.

For more information about time to the MRCA, see Bruce Walsh's Time to Most Recent Common Ancestry Calculator.

mtDNA Matches

Because mtDNA mutations are very rare, a nearly perfect match in test results between two individuals is not as helpful as it is for the above patrilineal (Y-DNA)case. In the matrilineal case, it takes a perfect match to be very helpful.

An exact match on the first hypervariable region, HVR1, is sometimes called a low resolution match. When they also share the same haplogroup it indicates a possible connection. An HVR1 match has a 50% chance of a common ancestor within fifty-two generations. That is about 1,300 years. When they have different haplogroups it is due to convergent evolution. That is when by coincidence two different lineages mutate to look alike.

An exact match on both the first and second hypervariable regions, HVR1 and HVR2, is sometimes called a high resolution match. When they also share the same haplogroup it indicates a possible connection. A match on HVR1 and HVR2 has a 50% chance of a common ancestor within twenty-eight generations. That is about 700 years. When they have different haplogroups it is due to convergent evolution. That is when by coincidence two different lineages mutate to look alike.

The Mitochondrial DNA Full Genomic Sequence (FGS), sometimes called Mega, test will:
Return results for all three parts of the mitochondrial DNA.
HVR1 – 16001 to 16569
HVR2 – 00001 to 00574
The Coding Region – 00575 to 16000
Improve the quality of the matches for geographic and ethnic origins.
Provide the highest level of confidence in a match for genealogical research.
Allow the assignment to a subclade within your haplogroup.


CLICK HERE to return to the Granter Family DNA page

CLICK HERE to return to the Grantner Family Tree Online page