Exploring sequence variant annotation in human

The NPC1 (Niemann-Pick disease, type C1) gene encodes a protein that is thought to be involved in the intracellular transport of cholesterol. One of the variants in this gene, with HGVS descriptions ‘H215R’ or ‘c.644A>G’, has been identified as a genetic risk factor for obesity.

(a) Find the NPC1 gene for human and go to the Variation table page. What is the dbSNP accession number (rs number) for the H215R variant ?

(b) Is this variant a missense variant (a Sequence Ontology term) for all transcripts that have been annotated for the NPC1 gene?

(c) What are the SIFT and Polyphen predictions for this variant?

(d) Why does Ensembl put the T allele first (T/C)?

(e) What is the ancestral allele predicted for this locus?

(f) Which allele is associated with obesity and what is the significance of the association?

(g) How many publications mention this variant?

(a) Go to the Ensembl species page for human and search for NPC1.

In the Variation Table, type e.g. 215 and/or h/r in the Filter text box.

The dbSNP accession number for the H215R variant is rs1805081..

(b) Click on rs1805081.

No, rs1805081 is missense for only one NPC1 transcript (i.e. ENST00000269228.10). It is non-coding exon variant (NC transcript variant) for one transcript (i.e. ENST00000540608.5). Note that in total 14 transcripts have been annotated for the NPC1 gene: http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000141458.

(c) Both SIFT and polyphen predict this change to not damage protein function.

SIFT describes the change as tolerated and Polyphen describes it as benign.

(d) In Ensembl, the allele that is present in the reference genome assembly is put first, i.e. T. In the literature normally the major allele (in the population of interest) is put first. In the case of rs1805081 the allele in the reference genome is the major allele in almost all populations studied, but as the reference genome is a mosaic of the genomes of just a few individuals this is by no means the case for all variants.

(e) The ancestral allele is reported as T in the Alleles line. Also in the Alleles line, the minor allele frequency (MAF) is reported as 0.22 for allele C, so the ancestral allele is the most common allele.

(f) Click on Phenotype Data in the side menu.

The A allele was reported to be associated with obesity with a p-value of 3.00e-07 in the paper ‘Genome-wide association study for early-onset and morbid adult obesity identifies three new risk loci in European populations’ (Meyre et al. Nat Genet 2009 Feb;41(2):157-9). This information has been extracted by curators from the GWAS catalog project.

(g) Click on Citations in the side menu.

64 publications mention this variant.