Exploring VNTR in human
Variable number tandem repeats (VNTRs) show high variation in the number of repeats in the population and are commonly used in forensics (DNA fingerprinting) and to study genetic diversity. (a) Go to the region from 3074666 to 3075100 bp on human chromosome 4. Which gene does it overlap? Which exon of this gene falls in this region?
(b) Configure this page to turn on Repeats (low), Simple repeats (Repeats (low)) and Tandem repeats (TRF) tracks in this view. Can you see any repeats in this exon? What tools were used to annotate the repeats according to the track information?
(c) Zoom in on the polyglutamine (PolyQ) tract or (CAG)n to see its sequence. How many CAG repeats can you see in the human reference assembly? Does this tract overlap any phenotype-associated variants? What is the identifier of this variant?
(d) Go to the variant tab of the phenotype-associated variant. What is the consequence ontology of this variant? Does the reference allele match the number of repeats you have just counted? What is the shortest and longest allele?
(e) Are there any phenotypes associated with this variant?
(a) Select Search: Human and type 4:3074666-3075100 in the text box (or alternatively type human 4:3074666-3075100 in the text box). Click Go.
Click on the golden transcript falling in this region. You can see it’s exon 1 of 67 of the huntingtin gene (HTT).
(b) Click Configure this page in the side menu then select: Repeats (low), Simple repeats (Repeats (low)) and Tandem repeats (TRF).
There are two types of tandem repeats in this exon: polyglutamine (PolyQ) tract or (CAG)n and polyproline (PolyP) tract or (CCG)n; annotated by two different methods. Click on the track names to find more about the tools used for annotation: RepeatMasker and Tandem Repeats Finder.
(c) Draw with your mouse a box around the polyglutamine (PolyQ) tract or (CAG)n. Click on Jump to region in the pop-up menu.
There are 19 CAG repeats in the human reference sequence overlapping rs71180116 indicated by a pink bar in the All phenotype-associated - short variants (SNPs and indels) track.
(d) Click on the rs71180116 ID to go to the variant tab. You can see in the summary page that this variant is classified as an inframe insertion. Either click + to show all of the alleles in the summary page or go to the Genes and regulation table. This variant has many alternative alleles which differ in the number of repeats. The first allele in the expanded Alleles section of the summary page or the first allele in the Codons column in the Genes and regulation table is the reference allele. It is composed of 19 CAG repeats just as in the Region in detail view. The shortest allele has 7 repeats, the longest has 55 repeats.
(e) Click on Phenotype data in the side menu. This variant is associated with Huntington disease, a trinucleotide repeat disorder (polyQ disease) caused by a pathogenic number of CAG repeats (above 36 copies) in a coding region of HTT.