Tuesday, December 25, 2012

All I want for Christmas is a cure for HIV

Commonly known as CCR5-Δ32 (CCR5 delta 32), rs333 isn’t a true SNP - it’s a deletion of 32 nucleotides in the CCR5 chemokine receptor gene. It was considered important enough though for dbSNP to assign it one of the earliest 3 digit SNP identifiers, and for 23andMe to have added it to their custom content under the name i3003626. What makes this variation so interesting is that people who carry 2 copies of the deletion are highly resistant to HIV infection. Wikipedia has a great explanation of this variation and it’s likely spread from survivors of earlier European plagues.

Timothy Ray Brown was suffering from both HIV and leukemia, and during the course of treatment for his leukemia, researchers in Berlin realized there might be a way to take advantage of rs333’s ability to resist HIV. After chemotherapy and radiation treatments, the researchers transplanted bone marrow from a donor with 2 copies of the rs333 deletion. These donor cells were able to reconstitute his blood.

The transplant took place in 2007, and doctors waited and watched for years while his body cleaned out the HIV and repaired itself. Finally publishing in 2010 and 2011, they were able to conclude that the 'SNP transplant' cured their patient’s HIV PMID 21148083Evidence for the cure of HIV infection by CCR5Δ32/Δ32 stem cell transplantation”.

Now in 2012 a Phase 2 clinical trial has begun recruiting patients. The protocol indicates donors “must be a 7/8 or 8/8 match at HLA-A, -B, -C, and -DRB1”. This is the same sort of donor matching we described five days ago in Do you see what I MHC so perhaps the trial could also record the rs2281389 and rs1800795 genotypes..

Despite the urgent need for a cure for HIV infection, “the risks associated with chemotherapy and radiation, and the relatively low frequency of ccr532 homozygous individuals, makes it unlikely that allogeneic HSC transplants using cells from ccr532 homozygous donors will become a widespread treatment option, and has prompted attempts to mimic the genetic knockout”.(PMID 22470838)

So this type of transplant only directly applies to the fortunately rare cases where a patient has to undergo chemotherapy and radiation to treat their leukemia, enabling a stem cell transplant to repopulate their blood system. It isn’t a practical solution for large numbers of patients. But a ‘SNP transplant’ can be a cure.

Monday, December 24, 2012

Deck the halls with boughs of HOXB

Prostate cancer ranks as the most commonly diagnosed non-skin cancer in Caucasian men, and it’s the most heritable common cancer. Genome-wide association studies (GWAS) have yielded perhaps 30 – 40 SNPs each associated with a (rather small) increase in risk, and so most inherited risk remains unexplained. A paper published early this year (PMID 22236224) about rs138213197 may have begun a new chapter in the search for inherited prostate cancer risk, and it’s indicative of how new methods can bring about highly significant discoveries.

For almost a decade since linkage evidence emerged indicating that a predisposition to prostate cancer was due to variants in the ch 17q21-22 region, GWAS studies yielded little from the region. The labs of K. Cooney (U Mich) and W. Isaacs (JHU) turned to exome sequencing – specifically, they looked at over 2000 exomes from 200 genes spanning around 15.5MB of ch 17q21-22, and they did this for almost 100 prostate cancer patients. In these exomes, each patient had around 12 novel variants, along with almost 700 already reported in dbSNP. Due to its occurrence in several patients and the role of HOXB13 as a transcription factor involved in prostate development, HOXB13 variant rs138213197(T) was identified. Based on a subsequent case-control study, where this SNP was seen in 72 of ~5,000 patients but in only 1 person out of 1400 controls, the researchers concluded that although the variant is clearly rare, it is recurrent. A carrier’s odds of developing prostate cancer are much higher - 10 to 20 fold - compared to a non-carrier’s risk. Since the publication of this finding early this year, already at least five other labs have confirmed the rs138213197 association with prostate cancer risk in independent populations, totaling over 20,000 patients. While the odds for increased risk are bit lower (ranging from 3 to 8 fold) in these studies, they remain far higher than for previously known markers.

In addition to its implications for prostate cancer diagnosis and treatment, the finding of a “rare but recurrent” variant with such a strong effect on cancer risk supports the concept that a significant fraction of the heritability of common diseases will indeed be due to quite rare variants. These variants will not have been detectable by GWAS studies (or the meta-analyses based on them, for that matter), but will now come to light thanks to exome and genome sequencing. Whether it is indeed due to our species population growth from a few million to 7 billion in just 400 generations or for other reasons (1), we each carry a different set of quite rare alleles with quite profound effects.

Sunday, December 23, 2012

Do they know it’s Christmas?

Myriad is in the middle of a multi-year defense of their BRCA gene patents. It has reached the Supreme Court and in 2013 we’re likely to get a significant ruling on the patentability of human genes. This year 23andMe stepped into the battle for the first time. Instead of patenting a whole gene, 23andMe was granted a patent on a specific SNP, rs10513789, and its use in predicting Parkinson’s risk. This may not have been too much of a shock to experts like Dan Vorhaus (who promptly posted a detailed and informative discussion here), but some customers felt this should have earned 23andMe a spot on Santa’s Naughty List. This prompted 23andMe to clarify that they “will not prevent others from accessing their genetic data or its interpretation”.

That’s probably reason enough to earn a spot here on our top 12 SNPs list, but this SNP wasn’t done making news. Late this year National Geographic began their second Genographic project. This time they’re checking 150,000 markers which are informative of ancestry. Medically relevant SNPs aren’t supposed to be part of the collection, but some SNPs have both medical and genealogical associations. In fact, nearly 500 of the NatGeo SNPs are already in SNPedia, many with medical consequences. To our great surprise, one them turns out to be … yes, you guessed it … 23andMe’s very own rs10513789.

Saturday, December 22, 2012

I'm dreaming of a white (matter) Christmas

In the almost 20 years since the discovery of the ApoE4 allele’s association with increased risk for late-onset Alzheimer’s disease, not a single additional SNP was discovered with a comparably strong effect … until this year. And the SNP discovered has interesting implications not only for Alzheimer’s but for how it bolsters a theory about diseases in general.

Publishing back to back in the November issue of NEJM (PMID 23150908, PMID 23150934), both the deCODE and Alzheimer Genetic Analysis Group teams found that a single copy of rs75932628(T), a SNP in the TREM2 gene, increases risk for Alzheimer’s about three-fold. This is comparable to the increased risk associated with an APOE4 allele. But it has a frequency of under 1%, too low to have been used or detected in most genome wide association (GWAS) studies, unlike the APOE4 allele’s frequency of about 15%.

The TREM2 gene encodes a protein known as “triggering receptor expressed on myeloid cells 2”, involved in microglial activity and inflammation. The rs75932638(T) allele somehow leads to reduced TREM2 activity, apparently reducing the microglial-cell based removal of beta amyloid. In addition to the quest for drugs that will reduce amyloid deposition directly, there’s now likely to be more support for drugs stimulating TREM2 activity as well as other molecules of the inflammatory cascade with roles in amyloid clearance (PMID:23237888).

It also turns out that homozygous recessive loss-of-function mutations in TREM2 have previously been associated with Nasu-Hakola disease, a very rare disorder involving early-onset dementia and bone fractures. The addition of another example where inheriting two loss-of-function mutations leads to a severe, early-onset disease, while inheriting only one leads to the late-onset of an apparently different disease, was noted by both sets of authors. It may well be that inheriting single loss-of-function SNPs will be the most common underlying risk factors for late-onset diseases (PMID 20813421).

Exome and genome sequencing are proving effective at uncovering these moderate risk, low frequency variants. In 2013 (and beyond) we hope to see more studies of late-onset diseases in the heterozygous relatives of children diagnosed with loss-of-function homozygous recessive diseases.

Friday, December 21, 2012

Do you know what I know?

Adam Foye’s SNPs have earned a spot on the 12 SNPs of Christmas 2012, even though by agreement we can’t name them publicly yet. The elves at SNPedia were one of the 23 teams taking part this year in the CLARITY Challenge. Sponsored by Boston Children’s Hospital, the goal was to yield useful information for three families with undiagnosable genetic disorders and document the process used. In each family, the genomes of the affected child (Adam Foye in the first family) and both parents were fully sequenced by Complete Genomics, and exome sequenced by a Life Technologies ABI SOLiD. In one family, several extended family members were also exome sequenced. Illustrating the difficulty of the challenge, despite the high volume of state of the art data, the recent press release revealed that so far only one of the three families has received a high confidence diagnosis.

Adam Foye has mutations in in the gene Gap junction beta-2 (GJB2) on chromosome 13. This means each of his parents carries one copy of a non-functional version of this gene and he inherited both of the non-functional forms. This resulted in sensorineural hearing loss and the need to wear hearing aids. GJB2 mutations are well known in the literature, and many have dbSNP identifiers. Nine of the most common are included in the DNA-chip based test from 23andMe.

This enables the identification of some, but not all, causes of this type of hearing loss. Carrier status - such as of Adam Foye’s parents - could also have been determined. Without genotyping the parents, however, the test wouldn’t have been conclusive. We wouldn’t have been able to see if both mutations were on the same strand, leaving one functional copy of the gene, or on different strands, damaging both copies.

Last week 23andMe announced that they’ve begun to perform phasing of their data. By using either of his parents (or their entire database), they can assign most variations as being either on the same or different strands. For the moment phased data isn’t downloadable by their customers, but maybe that will change. If not, there will be inexpensive genomic sequencing in a few years.

Unfortunately it wasn’t hearing loss that triggered Adam’s inclusion in the CLARITY Challenge; it was a more pressing medical concern, a form of muscular dystrophy. For this condition genotyping chip tests would have been insufficient. As publicly announced in CLARITY press releases, mutations in the Titin (TTN) gene were reported by several teams. This gene is named for Titan, the giant Greek god, and the name fits. It produces the largest known protein at 34,350 amino acids in length and has the most (363!) exons.

The winning team has been announced; it’s the coalition of five respected research institutes (Brigham and Women’s Hospital, Massachusetts General Hospital, Partners Laboratory for Molecular Medicine, Brown University, and Utrecht University), and although we were not part of that coalition, we are grateful to have participated. This sort of analysis is not the core strength of SNPedia or our analysis tool, Promethease. We are best suited for reporting on published variants in DTC samples, and communicating information about them to a wider audience. Much of CLARITY emphasized searching for novel mutations and communicating them to clinicians. But the gap between clinicians and the public continues to narrow, and we were curious to see where our tools might be applicable.

Even before this contest, Promethease was able to directly work with much of the data provided by Complete Genomics. Their dbSNP file concisely describes all genotypes already known to dbSNP. That will never be able to catch a new de-novo mutation, but with the October completion of the 1000 Genomes Project dbSNP now covers over 38 million validated mutations. As 2012 draws to a close, 35,600 of them have associated literature in SNPedia. These are the SNPs which are easiest to interpret. For GJB2 we currently have 53 annotated SNPs, and for TTN 5. With time those numbers will continue to rise.

The exome data was more challenging. It was just a collection of short DNA reads, not yet assembled or aligned to the reference genome. Pre-assembled data would have been easier to work with, but might have also contained some mis-assemblies. The reference genome is not a universal truth. In SNPedia's Nightmare Before Christmas we discussed how it varies over time as the community learns more. Promethease doesn’t do this sort of de-novo assembly, and we are not likely to add that feature anytime soon.

Completing this challenge pushed us into new capabilities such as support for a new Family Trio report, which works with all 38M snps, not just the 35k in SNPedia. While it was developed to help us process the CLARITY data, it became publicly available as of Promethease v0.1.144 over 3 months ago. It also made us aware of areas for improvement, such as in detecting CNVs or compound heterozygosity.

In time we expect to be able to share more results of the CLARITY Challenge and highlight the inspiring work of the other teams. Most importantly, even though the contest submission period has closed, we are hoping that the availability of genomic sequence from these families along with improved methods may yet yield answers for the other two families.

Thursday, December 20, 2012

Do you see what I MHC?

Two independent SNPs discovered this year, rs2281389 and rs1800795, may indicate that the last decade of inconsistent genome-wide association studies hunting for ways to reduce blood cell transplant failures was not for naught. 

Humans have a lot of genetic variation in the human leukocyte antigen (HLA) and this variation creates a somewhat unique signature on the surface of your cells. Your immune system relies on this signature to distinguish its own cells from foreign invaders, and then to target foreigners for attack. This is a great system when you’re trying to fight off a cold, but a problem when you’re trying to find a donor for an organ or tissue transplant. This is why donor registries screen for similar HLA signatures, to minimize the odds of a conflict. There are immune suppressing drugs that lower the odds of rejection, but these come with their own risks. When you’re in need of an organ or tissue donation, you need your immune system working well to help you, not for it to be suppressed.

Graft versus host disease (GVHD) is when the donated cells recognize the recipient as foreign and begin to attack. This affects up to 80% of patients and is a major cause of transplant failure, early complications and death. In the 2012 paper PMID 22837536 researchers looked for SNPs spanning the Major Histocompatibility Complex (MHC) region which didn’t match between 4,000 donor/recipient pairs who were perfectly matched across the 5 major HLA loci. The most significant SNP found, rs2281389, raised the risk and severity of acute GVHD by about 40%. Curiously, it’s the mismatch itself - not any of the possible genotypes - that increases GVHD risk.

Outside the MHC, an independent comprehensive study (PMID 22282500) of 1300 allogeneic hematopoietic cell transplantation (HCT) donors and recipients using previously reported GVHD-associated SNPs concluded that the best replicating - and almost only - variant was IL6 gene SNP rs1800795. This SNP was associated with a 20%-50% increased risk for GVHD, and here, there may be a biological explanation since the rs1800795(G) allele has been associated with increased serum levels of IL-6 and several autoimmune and inflammatory diseases.

Upcoming clinical trials will determine if using these SNPs for improved genetic matching of donors and recipients will lower the incidence of GVHD. And when full genome sequence becomes a routine part of one’s medical record, these type of studies, along with the predictive HLA-typing, will greatly increase the number of potential donors while minimizing rejection risks. As with many Christmas gifts, the new ones will complement and co-exist with the old ones.

Wednesday, December 19, 2012

Christmas in the Heart

rs11591147 is a SNP that warms the heart. This PCSK9 gene SNP has been known since 2005 to be associated with reduced average low density lipoprotein cholesterol (LDL-C), but this year it became the poster child of a set of 9 SNPs used to tackle a key question: to reduce heart disease risk, just how early in life should we start paying attention to LDL-C levels?

The meta-analysis of over 300,000 (!) patients published by Ference et al. [PMID:23083789] concludes that a person who’s lifetime average LDL exposure was reduced by about 40 mg/dl due to one or more of these 9 SNPs has their coronary heart disease risk cut in half. Their models suggest that the earlier you experience lower LDL levels, the better. If you weren’t born with enough of the LDL-lowering variants like rs11591147(T), the sooner you lower your LDL, the better, either by changing your lifestyle (through your diet) or by taking statins. Atherosclerosis isn’t just for old folks – it’s a progressive disease that begins in childhood.

This reminds us of rs4149056, a SNPedia Top 10 SNP pick of 2010. rs4149056(C) alleles and especially rs4149056(C;C) genotypes are associated with higher risk for side effects (muscle pain and degradation) when taking the statin simvastatin. This year, the Clinical Pharmacogenomics Implementation Consortium (CPIC) issued a clinical guideline with dosing recommendations for simvastatin when rs4149056 status is available [PMID: 22617227]. Pharmacogenomics continues to be one of the areas where genomics is directly improving patient health and well-being, and that too warms our hearts.

Tuesday, December 18, 2012


rs63750847 is a rare SNP, but a very good one. According to the Nature paper PMID 22801501 published in August, this variant in the Amyloid Precursor Protein (APP) gene reduces the risk of Alzheimer’s five-fold. It’s present in less than 1% of Icelandic and Scandinavian populations, and even rarer in North Americans. So while most of us aren’t fortunate enough to carry the rare protective allele, seeing such a strong protective effect may suggest new avenues for medicines and treatments.

The finding was published by deCODE, which has made numerous scientific contributions to the field, especially this year. CEO Kari Stefansson has previously expressed frustration that 23andMe’s DTC testing service was given the 2008 Invention of the Year by TIME Magazine, despite deCODE’s extensive research contributions and earlier - by a day or two - DTC launch date.
Perhaps there is some karma here, as deCODE’s latest discovery has won them high praise and was a factor in their $415M acquisition by Amgen. Good research takes time - it took almost 20 years to make the subtle but fundamental shift from the scientific understanding that this SNP was not associated with excessive amyloid deposition [PMID 8170579] to deCODE’s recent conclusion that it actually protects against it. Amgen has stated that it won’t be offering deCODE’s DTC genomic testing, though, so even if 23andMe wasn’t the first DTC genomics testing company, they are the last of the pioneers to still be standing.

Monday, December 17, 2012

SNPedia’s Nightmare Before Christmas

rs17602729 is the "most prevalent genetic disease mutation" in Caucasians according to PMID 11331279 as it appears to cause muscle pain after exercise for some people. Despite this relatively minor medical impact, it represents SNPedia’s ‘Nightmare before Christmas’.

As our understanding of the genome improves, the scientific community occasionally updates the reference standards. In August 23andMe updated from human reference genome build 36 to build 37.3. If you downloaded your raw data in July, this snp was at position 115037580 but after August it was at 115236057.  SNPedia made similar changes back in 2010, but in 2012 we discovered that this snp hadn’t just changed position, it had also changed it's orientation. With this newest assembly, dbSNP flipped the SNP to the opposite strand of DNA, changing the normal G and variant A into normal C and variant T.

Confused? You’re not alone. ClinVar is the new NIH database of variation which affects human health. If you download their raw data and search for rs176027291 you will see that it shows all 4 alleles, with the G as normal. This is probably an artifact of the orientation change, but its likely that someone out there does have each allele. And each one of these nucleotide changes causes a different change to the amino. Since the most common variant is a premature stop codon, the altered aminos are probably viable with subtly different effects. Getting this SNP 
- and others that are similar challenges - straightened out in 2012 would take a Christmas miracle.

Sunday, December 16, 2012

O come all ye faithful

rs10937823 is one of the nicest gifts given to SNPedia this year. The minor rs10937823(T) allele has been associated with bipolar disorder in at least four independent studies as of last count, albeit with some inconsistency between populations as to which allele is the risk allele.

So why is it a gift?  Because of who gave it to us. SNPedia exists and expands thanks in large part to the contributions of a community of people who feel it’s important to make scientific and medical findings about the genome accessible to all. Researchers, professors, physicians, techies, homemakers, students and many more contribute.  Some are better at adding papers and p-scores, while others improve the grammar and spelling or rewrite to make the ideas accessible to a wider audience. Everyone contributes a little, and we’re all better off for it.

Our information about rs10937823 was primarily contributed by a med student in Stanford University’s Gene210 class. In this groundbreaking course from Professor Stuart Kim, students analyze their own genomes as they prepare to become doctors and researchers in a post-genome world.

Here’s a big Thank You! to our community – those who believe that if the genome matters, then getting the information out in ways that can be readily used matters too. Everytime a SNPedia page is edited an angel gets its wings.

Saturday, December 15, 2012

(A;A) Christmas Platypus

rs55705857 is strongly associated with the most common form of primary brain cancer, glioma, but it also achieves a distinction of being one of the most strongly cancer-associated SNPs ever found in a SNP survey.

The study published in Nature Genetics this fall [PMID 22922872] by researchers at UCSF and the Mayo Clinic found that rs55705857(G) allele carriers are at 6 times higher risk for glioma formation than non-carriers, and in particular, for subtypes known to harbor IDH1 or IDH2 somatic mutations. It is comforting that while the mutation is common (between 2 - 8% of us harbor this allele), gliomas are rare (diagnosed in around 3 people per 100,000 every year) so most carriers will never develop such tumors. Unfortunately, for those that do, it’s often fatal, as it was for a good friend and colleague, Neil Ghiso.

The chromosomal region (8q24) this SNP is located in has previously yielded SNPs associated with ovarian and prostate cancer, but with much lower odds ratios. And while it’s clear the region is important in some regulatory manner, it’s not yet clear how.  Perhaps another one of 2012’s top scientific stories – the first major release by the ENCODE project of data on functional elements in the genome  – will help explain this.

And here’s where the platypus comes in.  Even though it’s far from a coding region, sequencing shows that the common rs55705857(A) allele is invariant in all mammals, from humans through to, yes, the platypus. Here’s to the Christmas Platypus!

Friday, December 14, 2012

The 12 SNPs of Christmas 2012

It’s been another busy year for SNPedia. More and more folks not only use SNPedia, but along with added features in Promethease, there’s more for every person to learn about themselves. Our genomes become more interesting and informative every year – DNA is the ultimate gift from our parents that keeps on giving.

Like the Christmas carol, for each of the next 12 days we’ll be singing about a SNP that caught our attention in 2012. In some ways this is a Top 12 List, but it’s also a way for us to call attention and thank many of you in our community. So here we go!

On the first day of Christmas … we’ve got to think of Santa and his rs9939609 genotype. This SNP is in the FTO gene, and it has long been associated with obesity, and in some populations, with Type-2 Diabetes. But just last month the Meyre Lab (McMaster Univ.) reported the results of studying over 6,000 patients with depression in their Nature paper [PMID 23164817], and lo and behold, rs9939609 is also associated with resistance to depression. In fact, the minor A allele increases the likelihood of both being obese AND being jolly. Perhaps Santa carries the A allele?! We’ve been unable to get his DNA tested so far, but with the Ho Ho FTO theme in mind, will be leaving him a very special batch of cookies.
  1. The 12 SNPs of Christmas 2012
  2. (A;A) Christmas Platypus
  3. O come all ye faithful
  4. SNPedia’s Nightmare Before Christmas
  5. Hurðaskellir
  6. Christmas in the Heart
  7. Do you see what I MHC?
  8. Do you know what I know?
  9. I'm dreaming of a white (matter) Christmas
  10. Do they know it’s Christmas?
  11. Deck the halls with boughs of HOXB
  12. All I want for Christmas is a cure for HIV

Saturday, February 11, 2012


In March 2010, scientists announced the discovery of a finger bone fragment of a juvenile female that lived about 41,000 years ago, found in Denisova Cave in Altai Krai, Russia. The full genome of this Denisova hominin was recently made available. The Unified genotyper was used to to add dbSNP 132 rs#s to the genome. The resulting VCF file can be read by Promethease to produce:


Through the 5519 SNPedia annotated snps we can learn more about this distant relative.


As first noted by John Hawks the snp rs7412 couldn't be reliably called, but rs429358 and rs4420638 were. These are consistent with an E4/E4.
At present, the frequency of APOE*4 within all the major human groups remains higher in those populations…where an economy of foraging still exists, or food supply is now or has until recently been scarce, sporadically available or qualitatively poor. Under these environmental conditions, carrying the APOE*4 could be still useful.
--source PMID 10738542.

Hair Morphology

rs3124314(C;C) suggest some curliness to the hair, while rs261360(A;G) is notable for heterozygosity. 5 other snps (rs12623288(A;A), rs1268789(G;G), rs1454292(T;T), rs6732426(T;T), rs908922(A;A)) are all consistent with straighter hair.

Skin color


This should be female, but 23508 rs#s on the Y-chrom were found, with only rs9786465 being in SNPedia. The Unified Genotyper provides a partial explanation about the challenges of calling sex chromosomes.

Caveat lector

The traits below are more about notoriously difficult to define and phenotype even in modern humans, and in a single sample of 40k year old non-human dna they should be covered in NaCl. However, deeper analysis needs to begin somewhere so ...


rs53576(G;G) in the Oxytocin receptor (OXTR). This genotype appears to be significantly better at accurately reading the emotions of others by observing their faces than were the remaining three-quarters of subjects, with (A;A) or (A;G). (G;G) individuals were also less likely to startle when blasted by a loud noise, or to become stressed at the prospect of such a noise.


Quite a few rare genos for intelligence with a possible emphasis on spatial working memory.


rs2710102 in CNTNAP2 has been associated with impaired speech development.


Denisova data lives in the eu-west-1b region of the Amazon cloud as snap-­3cc2de54.

Processed with

dbsnp acquired from

Generated via

java -jar GenomeAnalysisTK.jar -R /mnt/mydata/human_g1k_v37.fasta -T UnifiedGenotyper -I /mnt/den/denisova_genome/T_hg19_1000g.bam --dbsnp /mnt/mydata/dbsnp_132.b37.vcf -o /mnt/mydata/snps.raw.vcf > /mnt/mydata/alog.txt

Log finishes with

INFO 10:11:25,697 UnifiedGenotyper - Visited bases 3101804739
INFO 10:11:25,697 UnifiedGenotyper - Callable bases 2862033547
INFO 10:11:25,698 UnifiedGenotyper - Confidently called bases 112644898
INFO 10:11:25,698 UnifiedGenotyper - % callable bases of all loci 92.270
INFO 10:11:25,698 UnifiedGenotyper - % confidently called bases of all loci 3.632
INFO 10:11:25,698 UnifiedGenotyper - % confidently called bases of callable loci 3.936
INFO 10:11:25,699 UnifiedGenotyper - Actual calls made 4989617
INFO 10:11:25,714 TraversalEngine - Total runtime 66799.27 secs, 1113.32 min, 18.56 hours
INFO 10:11:25,823 TraversalEngine - 0 reads were filtered out during traversal out of 1424486071 total (0.00%)

It was run on an m1.large, but never managed to use both cpus, instead maxing out at 50% cpu usage.

[ec2-user@ip-10-234-51-252 den]$ df -H
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 8.5G 2.0G 6.5G 24% /
tmpfs 4.0G 0 4.0G 0% /dev/shm
/dev/xvdf 212G 170G 31G 85% /mnt/den
/dev/xvdg 159G 5.8G 145G 4% /mnt/mydata

[ec2-user@ip-10-234-51-252 mydata]$ ls -tral
total 5416212
drwxr-xr-x 4 root root 4096 Feb 9 15:03 ..
drwx------ 2 root root 16384 Feb 9 15:06 lost+found
-rw-rw-r-- 1 ec2-user ec2-user 4578627636 Feb 9 15:35 dbsnp_132.b37.vcf
-rw-rw-r-- 1 ec2-user ec2-user 12379076 Feb 9 15:35 dbsnp_132.b37.vcf.idx
-rw-rw-r-- 1 ec2-user ec2-user 942611555 Feb 10 10:11 snps.raw.vcf
-rw-rw-r-- 1 ec2-user ec2-user 12394492 Feb 10 10:13 snps.raw.vcf.idx
-rw-rw-r-- 1 ec2-user ec2-user 142784 Feb 10 10:13 alog.txt
drwxrwxrwx 3 root root 4096 Feb 10 12:16 .

Half of the /mnt/den is the chimpanzee data, so the alignment to human was 85gb for the T_hg19_1000g.bam + 9mb for the index.

Saturday, January 21, 2012

Promethease 0.1.126 UI2

Promethease version 0.1.126 is downloadable. The most recent improvements have been to UI2, which is only available in the $2 paid runs. The improvements make it easier to sort, filter and explore your genome. You can try them out by clicking on this Lilly Mendel UI2 report or just watch them in the video below.

Notable features

  • Sort by Magnitude, Frequency or # of References

  • Green/red highlighting of good/bad news

  • Turn on/off good, bad, not set, SNPs or genosets

  • Filter out genos based on the Magnitude, # of References or both using AND/OR logic

  • Type a question mark ? to bring up a help menu

  • Ball & Spring graph is now in its own window, and can be zoomed with the mouse wheel

  • A chooser for Medicines, Medical conditions and Topics, with progressive text search

  • Each geno has a footer showing what categogies it belongs to, and allowing to to select all genos belonging to that category

  • At the bottom of the page press '2x more' or just type the number of records you want

  • Editor mode to link directly to the edit pages

That last feature is intended to encourage more edits to SNPedia. We welcome your edits big or small.

The reports may be too large to view on iPads, and there are still some problems with the graph under IE, but more improvements will certainly follow. Your bug reports and feature requests to info@promethease.com can help it to grow in the right direction.

Give it a try! Lilly Mendel UI2