Human population genetics and phenotype data
The SNP rs1738074 in the 5’ UTR of the human TAGAP gene has been identified as a genetic risk factor for a few diseases. Use Ensembl to answer the following questions:
-
In which transcripts is this SNP found?
-
What is the least frequent genotype for this SNP in the Yoruba (YRI) population from the 1000 Genomes phase 3?
-
What is the ancestral allele? Is it conserved in the 91 eutherian mammals EPO-Extended?
-
With which diseases is this SNP associated? Are there any known risk (or associated) alleles?
- Please note there is more than one way to get this answer. Either go to the Variation table of the human TAGAP gene, and use the Consequence filter to only include 5’UTR variants, or search Ensembl for
rs1738074
directly. Once you’re in the Variant tab, click on Genes and regulation in the menu.This SNP is found in four transcripts of TAGAP. It is also intronic to eleven non-coding transcripts of TAGAP-AS1 and one non-coding transcript of ENSG00000226032.
- Click on Population genetics in the left-hand panel, or click on Explore this variant in the left-hand panel and click the Population genetics icon.
In Yoruba (YRI), the least frequent genotype is CC at the frequency of 5.6%.
- Click on Phylogenetic context in the left-hand panel.
The ancestral allele is T and it’s inferred from the alignment in primates.
Click on Select an alignment which will open a pop-up menu. Open Multiple alignments and select 91 eutherian mammals EPO-Extended. Click on Apply at the bottom of the menu to save your settings.
A region containing the SNP (highlighted in red and placed in the centre) and its flanking sequence are displayed. The T allele is conserved in all but two of the eutherian mammals displayed.
- Click Phenotype data in the left-hand panel.
This variation is associated with multiple sclerosis, celiac disease and white blood cell count. There are known risk alleles for all three diseases and the corresponding P values are provided. The allele A is associated with celiac disease. Note that the alleles reported by Ensembl are T/C. Ensembl reports alleles on the forward strand. This suggests that A was reported on the reverse strand in the original paper. Similarly, one of the alleles reported for Multiple sclerosis is G.