Friday, December 21, 2012

Do you know what I know?

Adam Foye’s SNPs have earned a spot on the 12 SNPs of Christmas 2012, even though by agreement we can’t name them publicly yet. The elves at SNPedia were one of the 23 teams taking part this year in the CLARITY Challenge. Sponsored by Boston Children’s Hospital, the goal was to yield useful information for three families with undiagnosable genetic disorders and document the process used. In each family, the genomes of the affected child (Adam Foye in the first family) and both parents were fully sequenced by Complete Genomics, and exome sequenced by a Life Technologies ABI SOLiD. In one family, several extended family members were also exome sequenced. Illustrating the difficulty of the challenge, despite the high volume of state of the art data, the recent press release revealed that so far only one of the three families has received a high confidence diagnosis.

Adam Foye has mutations in in the gene Gap junction beta-2 (GJB2) on chromosome 13. This means each of his parents carries one copy of a non-functional version of this gene and he inherited both of the non-functional forms. This resulted in sensorineural hearing loss and the need to wear hearing aids. GJB2 mutations are well known in the literature, and many have dbSNP identifiers. Nine of the most common are included in the DNA-chip based test from 23andMe.


This enables the identification of some, but not all, causes of this type of hearing loss. Carrier status - such as of Adam Foye’s parents - could also have been determined. Without genotyping the parents, however, the test wouldn’t have been conclusive. We wouldn’t have been able to see if both mutations were on the same strand, leaving one functional copy of the gene, or on different strands, damaging both copies.

Last week 23andMe announced that they’ve begun to perform phasing of their data. By using either of his parents (or their entire database), they can assign most variations as being either on the same or different strands. For the moment phased data isn’t downloadable by their customers, but maybe that will change. If not, there will be inexpensive genomic sequencing in a few years.

Unfortunately it wasn’t hearing loss that triggered Adam’s inclusion in the CLARITY Challenge; it was a more pressing medical concern, a form of muscular dystrophy. For this condition genotyping chip tests would have been insufficient. As publicly announced in CLARITY press releases, mutations in the Titin (TTN) gene were reported by several teams. This gene is named for Titan, the giant Greek god, and the name fits. It produces the largest known protein at 34,350 amino acids in length and has the most (363!) exons.

The winning team has been announced; it’s the coalition of five respected research institutes (Brigham and Women’s Hospital, Massachusetts General Hospital, Partners Laboratory for Molecular Medicine, Brown University, and Utrecht University), and although we were not part of that coalition, we are grateful to have participated. This sort of analysis is not the core strength of SNPedia or our analysis tool, Promethease. We are best suited for reporting on published variants in DTC samples, and communicating information about them to a wider audience. Much of CLARITY emphasized searching for novel mutations and communicating them to clinicians. But the gap between clinicians and the public continues to narrow, and we were curious to see where our tools might be applicable.

Even before this contest, Promethease was able to directly work with much of the data provided by Complete Genomics. Their dbSNP file concisely describes all genotypes already known to dbSNP. That will never be able to catch a new de-novo mutation, but with the October completion of the 1000 Genomes Project dbSNP now covers over 38 million validated mutations. As 2012 draws to a close, 35,600 of them have associated literature in SNPedia. These are the SNPs which are easiest to interpret. For GJB2 we currently have 53 annotated SNPs, and for TTN 5. With time those numbers will continue to rise.

The exome data was more challenging. It was just a collection of short DNA reads, not yet assembled or aligned to the reference genome. Pre-assembled data would have been easier to work with, but might have also contained some mis-assemblies. The reference genome is not a universal truth. In SNPedia's Nightmare Before Christmas we discussed how it varies over time as the community learns more. Promethease doesn’t do this sort of de-novo assembly, and we are not likely to add that feature anytime soon.

Completing this challenge pushed us into new capabilities such as support for a new Family Trio report, which works with all 38M snps, not just the 35k in SNPedia. While it was developed to help us process the CLARITY data, it became publicly available as of Promethease v0.1.144 over 3 months ago. It also made us aware of areas for improvement, such as in detecting CNVs or compound heterozygosity.

In time we expect to be able to share more results of the CLARITY Challenge and highlight the inspiring work of the other teams. Most importantly, even though the contest submission period has closed, we are hoping that the availability of genomic sequence from these families along with improved methods may yet yield answers for the other two families.

No comments: