Wednesday, August 15, 2018

DNA genealogy 'splainer 1: Short Tandem Repeats

Most people know DNA contains four bases: adenine, thymine, guanine, and cytosine, or ATGC. Each of us has about a billion of these bases, more or less unique. And you could certainly contruct a family tree by collecting DNA from each of us, and matching it up (of which more later). But it would be expensive (a full sequence is still more than $1000) and a huge amount of computation.

So what we use most of the time are things called short tandem repeats, or STRs. These are non-coding stretches of DNA, more or less tiny parasites in our genome. They consist of short repeated sequences of bases. For example, a STR called DYS393 is (AGAT)n: that is, it's AGATAGATAGATAGAT... It has a particular location on the Y-chromosome. Y-chromosomes are good for genealogy, because in principle they follow the male line, along with the surname, and don't recombine (or scramble) with other chromosomes. And because STRs are non-coding and have a tendency to replicate or dereplicate, they vary fairly frequently, sometimes as often as once every 8 generations.

Humans have DYD393 count numbers of 9 to 17. A typical commercial Y-DNA test includes as few as 11 STRs or as many as 111. 11 probably isn't enough. So let's do an example. This is a sequence of count numbers from seven different STRs, from six individuals, all named Harbison, all apparently related. I've omitted the STRs that don't vary within this set.

So how do we turn this into a tree? You could probably do it by hand, but in the next posts I'll describe the formal mathematics.

No comments:

Post a Comment