Human population genetics and phenotype data

The SNP rs1738074 in the 5’ UTR of the human TAGAP gene has been identified as a genetic risk factor for a few diseases.

(a) In which transcripts is this SNP found?

(b) What is the least frequent genotype for this SNP in the Yoruba (YRI) population from the 1000 Genomes phase 3?

(c) What is the ancestral allele? Is it conserved in the 90 eutherian mammals?

(d) With which diseases is this SNP associated? Are there any known risk (or associated) alleles?

(a) Please note there is more than one way to get this answer. Either go to the Variation Table for the human TAGAP gene, and Filter variants to the 5’UTR, or search Ensembl for rs1738074 directly.

Once you’re in the Variation tab, click on the Genes and regulation link or icon.

This SNP is found in four transcripts of TAGAP. It is also intronic to five non-coding transcripts.

(b) Click on Population genetics at the left of the variation tab. (Or, click on Explore this variation at the left and click the Population genetics icon.)

In Yoruba (YRI), the least frequent genotype is CC at the frequency of 5.6%.

(c) Click on Phylogenetic context.

The ancestral allele is T and it’s inferred from the alignment in primates.

Select the 90 eutherian mammals EPO-Extended alignment and click on Apply.

A region containing the SNP (highlighted in red and placed in the centre) and its flanking sequence are displayed. The T allele is conserved in all but nine of the eutherian mammals displayed.

(d) Click Phenotype Data at the left of the Variation page.

This variation is associated with multiple sclerosis, celiac and white blood cell count. There are known risk alleles for both multiple sclerosis and celiac and the corresponding P values are provided. The allele A is associated with celiac disease. Note that the alleles reported by Ensembl are T/C. Ensembl reports alleles on the forward strand. This suggests that A was reported on the reverse strand in the original paper. Similarly, one of the alleles reported for Multiple sclerosis is G.