Friday, December 12, 2014

TIMBER! Forestry practices at AncestryDNA

How is AncestryDNA's new TIMBER procedure selecting our matches?  This is the story of two matches I had which, before Autosomalgeddon, appeared to be identical.

I became aware of these two matches because they showed up on GEDMATCH, and both matched me 18.9 cM on the far end of Chr. 11, where I have my only SubSaharan African segment.  These new Ancestry matches - R. M. and B. D. - were both estimated as Distant, 5th-8th, Low confidence matches under Ancestry version 1.   At GEDMATCH:

me to R.M. Chr. 11 126390369 to 134436845 18.9 cM 2583 SNPs

me to B.D. Chr. 11 126357648 to 134436845 18.9 cM 2574 SNPs

They matched each other, and matched other matches I have at Gedmatch.

I have been interested in this segment for a few years, and am tracking several other matches to it from 23andme and FTDNA's Family Finder, but B. D. was particularly interesting since she is predominantly subSaharan African.  All the other matches I had previously on this segment were mostly European, although they all shared the SSA segment at the end of Chr. 11.   I have not tracked down a common ancestor, but several (including me) have Collins ancestors, two (including R. M.) have Sexton ancestors, one of the Collins appears to have Lumbee connections, and all  for whom I have any information have Appalachian ancestry.   B. D. had a great grandmother from Roanoke, Virginia, who was believed to have Native American ancestry, which fits the mixed race pattern.

Then came the new, revised match list.  Now R. M. is estimated as a Distant, 5th-8th cousin Good match.  But B. D. is no longer a match at all.

I'm puzzled by this - the segment appears to be the same.  

Since we were all AncestryDNA matches before the rollout of version 2, the lack of phasing at gedmatch isn't the problem, since the version 1 set of AncestryDNA matches were phased.   In addition, I have a sister who is also at gedmatch, and B. D. also has a sister there (both sisters tested at 23andme).  My sister matches B. D. in exactly the same place, B. D.'s sister matches me in exactly the same place, and the two sisters match each other.  If a lack of phasing produced a pseudo-segment, it produced exactly the same segment in all four of us.

So I think TIMBER is the most likely reason for the change, but am still puzzled.  What could make the algorithm treat what is apparently the same segment differently in different people?   

Is this significant? While R. M., B. D. and I all appear to have the same number of matches in our shared segment, over the whole genome R. M. matches 1638 segments and I match 1343 segments, while B. D. matches only 442 (this is at gedmatch).

Could having relatively few matches overall affect the way TIMBER treats a particular match?  After the rollout of version 2 I did note several African-Americans saying they had lost mostly European matches.