Saturday, February 11, 2012

Denisova

In March 2010, scientists announced the discovery of a finger bone fragment of a juvenile female that lived about 41,000 years ago, found in Denisova Cave in Altai Krai, Russia. The full genome of this Denisova hominin was recently made available. The Unified genotyper was used to to add dbSNP 132 rs#s to the genome. The resulting VCF file can be read by Promethease to produce:


http://files.snpedia.com/reports/promethease_data/genome_DenisovaPinky_ui2.html



Through the 5519 SNPedia annotated snps we can learn more about this distant relative.

APOE


As first noted by John Hawks the snp rs7412 couldn't be reliably called, but rs429358 and rs4420638 were. These are consistent with an E4/E4.
At present, the frequency of APOE*4 within all the major human groups remains higher in those populations…where an economy of foraging still exists, or food supply is now or has until recently been scarce, sporadically available or qualitatively poor. Under these environmental conditions, carrying the APOE*4 could be still useful.
--source PMID 10738542.


Hair Morphology


rs3124314(C;C) suggest some curliness to the hair, while rs261360(A;G) is notable for heterozygosity. 5 other snps (rs12623288(A;A), rs1268789(G;G), rs1454292(T;T), rs6732426(T;T), rs908922(A;A)) are all consistent with straighter hair.

Skin color




Sex


This should be female, but 23508 rs#s on the Y-chrom were found, with only rs9786465 being in SNPedia. The Unified Genotyper provides a partial explanation about the challenges of calling sex chromosomes.

Caveat lector


The traits below are more about notoriously difficult to define and phenotype even in modern humans, and in a single sample of 40k year old non-human dna they should be covered in NaCl. However, deeper analysis needs to begin somewhere so ...

Optimism


rs53576(G;G) in the Oxytocin receptor (OXTR). This genotype appears to be significantly better at accurately reading the emotions of others by observing their faces than were the remaining three-quarters of subjects, with (A;A) or (A;G). (G;G) individuals were also less likely to startle when blasted by a loud noise, or to become stressed at the prospect of such a noise.

Intelligence


Quite a few rare genos for intelligence with a possible emphasis on spatial working memory.


Speech


rs2710102 in CNTNAP2 has been associated with impaired speech development.



Methods


Denisova data lives in the eu-west-1b region of the Amazon cloud as snap-­3cc2de54.

Processed with
GenomeAnalysisTK-1.4-25-g23e7f1b

dbsnp acquired from
ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/1.2/b37/dbsnp_132.b37.vcf.gz


Generated via

java -jar GenomeAnalysisTK.jar -R /mnt/mydata/human_g1k_v37.fasta -T UnifiedGenotyper -I /mnt/den/denisova_genome/T_hg19_1000g.bam --dbsnp /mnt/mydata/dbsnp_132.b37.vcf -o /mnt/mydata/snps.raw.vcf > /mnt/mydata/alog.txt



Log finishes with

INFO 10:11:25,697 UnifiedGenotyper - Visited bases 3101804739
INFO 10:11:25,697 UnifiedGenotyper - Callable bases 2862033547
INFO 10:11:25,698 UnifiedGenotyper - Confidently called bases 112644898
INFO 10:11:25,698 UnifiedGenotyper - % callable bases of all loci 92.270
INFO 10:11:25,698 UnifiedGenotyper - % confidently called bases of all loci 3.632
INFO 10:11:25,698 UnifiedGenotyper - % confidently called bases of callable loci 3.936
INFO 10:11:25,699 UnifiedGenotyper - Actual calls made 4989617
INFO 10:11:25,714 TraversalEngine - Total runtime 66799.27 secs, 1113.32 min, 18.56 hours
INFO 10:11:25,823 TraversalEngine - 0 reads were filtered out during traversal out of 1424486071 total (0.00%)


It was run on an m1.large, but never managed to use both cpus, instead maxing out at 50% cpu usage.


[ec2-user@ip-10-234-51-252 den]$ df -H
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 8.5G 2.0G 6.5G 24% /
tmpfs 4.0G 0 4.0G 0% /dev/shm
/dev/xvdf 212G 170G 31G 85% /mnt/den
/dev/xvdg 159G 5.8G 145G 4% /mnt/mydata

[ec2-user@ip-10-234-51-252 mydata]$ ls -tral
total 5416212
drwxr-xr-x 4 root root 4096 Feb 9 15:03 ..
drwx------ 2 root root 16384 Feb 9 15:06 lost+found
-rw-rw-r-- 1 ec2-user ec2-user 4578627636 Feb 9 15:35 dbsnp_132.b37.vcf
-rw-rw-r-- 1 ec2-user ec2-user 12379076 Feb 9 15:35 dbsnp_132.b37.vcf.idx
-rw-rw-r-- 1 ec2-user ec2-user 942611555 Feb 10 10:11 snps.raw.vcf
-rw-rw-r-- 1 ec2-user ec2-user 12394492 Feb 10 10:13 snps.raw.vcf.idx
-rw-rw-r-- 1 ec2-user ec2-user 142784 Feb 10 10:13 alog.txt
drwxrwxrwx 3 root root 4096 Feb 10 12:16 .



Half of the /mnt/den is the chimpanzee data, so the alignment to human was 85gb for the T_hg19_1000g.bam + 9mb for the index.

Saturday, January 21, 2012

Promethease 0.1.126 UI2

Promethease version 0.1.126 is downloadable. The most recent improvements have been to UI2, which is only available in the $2 paid runs. The improvements make it easier to sort, filter and explore your genome. You can try them out by clicking on this Lilly Mendel UI2 report or just watch them in the video below.



Notable features

  • Sort by Magnitude, Frequency or # of References

  • Green/red highlighting of good/bad news

  • Turn on/off good, bad, not set, SNPs or genosets

  • Filter out genos based on the Magnitude, # of References or both using AND/OR logic

  • Type a question mark ? to bring up a help menu

  • Ball & Spring graph is now in its own window, and can be zoomed with the mouse wheel

  • A chooser for Medicines, Medical conditions and Topics, with progressive text search

  • Each geno has a footer showing what categogies it belongs to, and allowing to to select all genos belonging to that category

  • At the bottom of the page press '2x more' or just type the number of records you want

  • Editor mode to link directly to the edit pages



That last feature is intended to encourage more edits to SNPedia. We welcome your edits big or small.

The reports may be too large to view on iPads, and there are still some problems with the graph under IE, but more improvements will certainly follow. Your bug reports and feature requests to info@promethease.com can help it to grow in the right direction.

Give it a try! Lilly Mendel UI2

Friday, December 9, 2011

The SNPedia Paper

SNPedia: a wiki supporting personal genome annotation, interpretation and analysis
Michael Cariaso; Greg Lennon
Nucleic Acids Research 2011; doi: 10.1093/nar/gkr798

Friday, November 18, 2011

Using Promethease

Last year 23andMe introduced a pricing model with a one year minimum subscription. I think it's a great service and will continue to pay my monthly fee to continue participating in the online discussions and get their updated analysis, but I'm sure a few people will decide updated genetic information is not in their budget this year. This post will show you how to keep getting new information about your genome for many years to come, with zero further expense.

To do this, you will need to download your raw data while your account is still in good standing. To begin, visit https://www.23andme.com/you/download/

After logging in, you will be presented with security questions such as these

23andMe security questions.

After a few moments your download will begin and you will receive a file named similar to genome_John_Smith_Full_201112034567.zip. Remember where you save it, once your subscription runs out you won't be able to get it again!

Now visit http://www.snpedia.com/index.php/Promethease and download the latest version from the bottom of the page.

Run Promethease.

Here you can see the Windows and Mac versions side by side.

Screen shot 2011-11-19 at 6.46.14 PM

Click Next to move to the Genotype Files page.

Screen shot 2011-11-19 at 6.40.35 PM

Click on the Load button and then find your genome_John_Smith_Full_201112034567.zip file from the beginning of this walk-through.

Screen shot 2011-11-19 at 6.42.40 PM

Then click Open to select your file. The filename will appear in the box.

Screen shot 2011-11-19 at 6.42.57 PM

Press Next to validate your file and move to the next screen. Along the way it will show how many genotypes are in your file. The number is probably just a bit below 1 million.

Screen shot 2011-11-19 at 6.43.09 PM

The next screen asks you to choose your Ethnicity. For many of us there is no perfect match, and that's fine. It has only a minor effect on the report that will be produced. This just shows some reference values for comparison to highlight how rare or common your genotypes are. Pick whatever seems closest and don't worry if it is rather distant from your true origins.

Screen shot 2011-11-19 at 6.43.26 PM

There are several more screens to go. You can just click Next on all of them until you get to the last one.



Screen shot 2011-11-19 at 4.42.39 PM Output Folder -- On this screen you choose where to store your completed analysis. You are specifying both the directory for all of the supporting files as well as the name for the top level report. The default values are based on your usual 'My Documents' folder and the name of your genotype file from the earlier step.



Screen shot 2011-11-19 at 7.23.14 PM Optional -- This screen allows registered commercial users of Promethease to identify themselves after their initial email to info@promethease.com . But if you're just running Promethease for youself or familiy members and not charging you should just leave it blank and click Next.



Screen shot 2011-11-19 at 4.42.59 PM Payment -- Click this button to pay $2 and unlock extra features such as running much faster or predicting the genetics of your children based on your partner's genotype file. That also introduces extra screens which are fairly self explanatory, but not further discussed in this blog post.



Screen shot 2011-11-19 at 4.43.11 PM Promethease Wizard -- The last screen before the system begins your analysis. Just click Next to begin.



Screen shot 2011-11-19 at 4.43.21 PM Status -- Promethease has begun your analysis and will contact the central server to figure out what SNPedia knows about your genotypes. This will need approximately 4 hours to run.



Screen shot 2011-11-19 at 6.26.09 PM



After approximately 4 hours you will see text similar to the above. It should have launched your web browser and showed you the report which looks similar to this one. In the report click on '...show more...' to drill deep into everything SNPedia knows about your data. Click on hyperlinks to be taken into SNPedia to see the full text, and find links to primary sources. As you learn more about your genome we hope you'll make edits to SNPedia and help teach all of us more. It's a big genome and we can't understand it without your help. We hope what we've learned so far is helpful to you. Rerun your report every few months to watch our growth and improved understanding.

Sunday, July 17, 2011

0.1.116 - It's a Family Affair

Russell: I just ran the newly released Promethease 0.1.116, it's getting better and better. It has me nailed pretty well.

me: I notice your report isn't clear on your ancestry and we now do a better job of that for some reports so in time I'd like to see what I can do to polish up that side of your data. SNPedia has you and your mother publicly, but since you have so many family members privately, take a look at these images from another family which shows where the chromosomes agree between family members. promethease_corpas_family_comparison_newfamily.html

me: Looking at you and your mother genome_Russell_Letkeman_20081023161035_newfamily.html I notice a region of similarity on chromosome 2 with a run of
similarity, and a little red spot on chromosome 11 where you and your mother are quite distinct.





Thursday, May 26, 2011

ESHG2011

Today was the end of the Galaxy 2011 conference and on Saturday, May 28 the European Society of Human Genetics (ESHG) will begin in Amsterdam. Ramunas of http://cancergenetics.wordpress.com and I would like to invite other netizens for a meetup. Bloggers, wiki editors, and lurkers are invited to

join us


Sunday, August 8, 2010

Party like it's 0.1.99

Promethease 0.1.99 is out and there is a new video tour of it's features. The most important is an interactive x-y scatter plot which you will see at the end.


If you have more questions see Promethease/Features