Wrap-up exercise
I’m interested in a set of genomic variants that have been identified on human chromosome 16, a C to T substitution at position 89919709, a G to C substitution at 89920908, and an A insertion between 89919344 and 89919345 on the forward strand.
(a) Using the VEP, in Ensembl default format, identify any genes or transcripts associated with this set of variants. What are the variant consequences? Have these variants previously been identified? Have SIFT or PolyPhen scores been computed for any?
Refer to the documentation for more help with formatting your variants for the VEP.
(b) Download your VEP results to BioMart (Variation mart). What phenotypes have been linked to the missense variant(s), if any?
(c) For the gene(s) identified in part (b) (in other words, associated with missense variant[s]), retrieve the sequence(s) in FASTA format using the REST API (rest.ensembl.org). Note that you will need to know the gene symbol and or Ensembl stable ID (ENSG ID) in order to proceed.
(a) Start by formatting your variants for input into the VEP. Ensembl default format describes variants by:
Chromosome • StartCoordinate • EndCoordinate • Ref / Alt Alleles • Strand (optional) • Name (optional)
Input the variants as follows:
16 89919709 89919709 C/T + var1
16 89920908 89920908 G/C + var2
16 89919345 89919344 -/A + var3
More details on formatting insertions can be found at in the documentation
The genes, transcripts and variant consequences are shown in the results table; the variants have all been previously published. One variant is missense, and for this variant there are SIFT and PolyPhen scores.
(b) Now click the link to BioMart: Variants in the Download box. This will export your results to BioMart and let you select Attributes to return more information.
Click Attributes, then ensure that Variant is selected. Under Phenotype annotation, choose Associated gene with phenotype and Phenotype name. Now click Results to generate your Results table; you can download the table or view all results in the browser by selecting the appropriate options.
(c) To retrieve sequence for ENSG00000258839 via the REST API, go to rest.ensembl.org and find the GET sequence/id/:id endpoint. Your query can be of the format http://rest.ensembl.org/sequence/id/ENSG00000258839?content-type=fasta
or
http://rest.ensembl.org/sequence/id/ENSG00000258839?content-type=text/x-fasta;type=genomic
which will both return genomic sequence for the MC1R gene.