Exploring the MYH9 gene in human
- In Ensembl, find the human MYH9 (myosin, heavy chain 9, non-muscle) gene and open the Gene tab.
- On which chromosome and which strand of the genome is this gene located?
- How many transcripts (splice variants) are there and how many are protein coding?
- What is the longest protein-coding transcript, and how long is the protein it encodes?
- Which transcript would you take forward for further study?
-
Click on Phenotypes at the left side of the page. Are there any diseases associated with this gene, according to Mendelian Inheritance in Man (MIM)?
-
What are some functions of MYH9 according to the Gene Ontology (GO) consortium? Have a look at the GO: Biological process pages for this gene.
- In the transcript table, click on the transcript ID for MYH9-201, and go to the Transcript tab.
- How many exons does it have?
- Are any of the exons completely or partially untranslated?
- Is there an associated sequence in UniProtKB/Swiss-Prot? Have a look at the General identifiers for this transcript.
- Are there microarray (oligo) probes that can be used to monitor ENST00000216181 expression?
- Select Human from the Species drop-down list and type
MYH9
. Click Go. Click on MYH9 (Human Gene) in the search results which will send you to the Gene tab.- The gene is located on chromosome 22 on the reverse strand.
- Ensembl has 23 transcripts annotated for this gene, of which 6 are protein-coding.
- The longest protein-coding transcript is MYH9-215 and it codes for a protein that is 1,981 amino acids long.
- MYH9-201 is the transcript I would take forward for further study, as it is the MANE Select transcript (for a description, mouse-over the MANE Select flag in the transcript table).
- Click on Phenotypes in the left-hand panel to see the associated phenotypes. There is a large table of phenotypes. To see only the ones from MIM, type
MIM
into the filter box at the top right-hand corner of the table.These are some of the phenotypes associated with MYH9 according to MIM: Deafness, Autosomal dominant 17 and Macrothrombocytopenia and granulocyte inclusions with or without nephritis or sensorineural hearing loss. You can click on the records for more information.
-
The Gene Ontology project maps terms to a protein in three classes: biological process, cellular component, and molecular function. Click on GO: Biological process on the left-hand panel. Angiogenesis, cell adhesion, and protein transport are some of the roles associated with MYH9. All GO terms are associated with a single transcript: ENST00000216181.
- Click on ENST00000216181.11 in the transcript table. You should now be on the Transcript tab.
- It has 41 exons, shown in the Transcript summary.
Click on the Exons link in the left-hand panel.
- Exon 1 is completely untranslated, and exons 2 and 41 are partially untranslated (UTR sequence is shown in orange). You can also see this in the cDNA view if you click on the cDNA link in the left side menu.
Click on General identifiers in the left-hand panel.
- P35579.254 from UniProt/Swiss-Prot matches the translation of the Ensembl transcript. Click on P35579.254 to go to UniProtKB, or click align for the alignment.
- Click on Oligo probes in the left-hand panel.
Probesets from Affymetrix, Agilent, Codelink, Illumina, and Phalanx OneArray match to this transcript sequence. Expression analysis with any of these probesets would reveal information about the transcript. Hint: this information can sometimes be found in the [ArrayExpress Atlas] (https://www.ebi.ac.uk/biostudies/arrayexpress).
Finding a gene associated with a phenotype
Phenylketonuria is a genetic disorder caused by an inability to metabolise phenylalanine in any body tissue. This results in an accumulation of phenylalanine causing seizures and intellectual disability.
(a) Search for phenylketonuria from the Ensembl homepage and narrow down your search to only genes. What gene is associated with this disorder?
(b) How many protein coding transcripts does this gene have? View all of these in the transcript comparison view.
(c) What is the MIM gene identifier for this gene?
(d) Go to the MANE Select transcript and look at its 3D structure. In the model 2pah, how many protein molecules can you see?
(a) Start at the Ensembl homepage (http://www.ensembl.org).
Type phenylketonuria into the search box then click Go. Choose Gene from the left hand menu.
The gene associated with this disorder is PAH, phenylalanine hydroxylase, ENSG00000171759.
(b) If the transcript table is hidden, click on Show transcript table to see it.
There are six protein coding transcripts.
Click on Transcript comparison in the left hand menu. Click on Select transcripts. Either select all the transcripts labelled protein coding one-by-one, or click on the drop down and select Protein coding. Close the menu.
(c) Click on External references.
The MIM gene ID is 612349.
(d) Open the transcript table and click on the ID for the MANE Select: ENST00000553106.6. Go to PDB 3D protein model in the left-hand menu.
The model 2pah is shown by default. It has two protein molecules in it. You may need to rotate the model to see this clearly.
Exploring the Dpp6 gene in mouse
Genetic variation in the dipeptidylpeptidase 6 Gene (DPP6) in humans has previously been strongly associated with amyotrophic lateral sclerosis (ALS), a lethal disorder caused by progressive degeneration of motor neurons in the brain.
-
Go to the Ensembl homepage, search for the Dpp6 gene in mouse and click on the transcript ID ENSMUST00000071500 to open the transcript tab. How many exons make up this transcript?
-
Click on Exons to display the exon sequences of the transcript. Which exon contains the translation start? What is the exon ID of the largest exon? What is the start and end phase of exon 2?
-
Go to the Protein summary. How many protein domains or features fall within the second exon? What is the Pfam protein domain at the C-terminus of the protein and how many exons does it fall into? Which amino acid positions does the domain above cover?
-
Go to Domains and features. Which domains are associated with Pfam? How many genes in the mouse genome have the IPR002469 domain? What chromosomes are these genes found on?
- Select Mouse from the Species search drop-down and type
Dpp6
and click Go. Click on Dpp6-201 (Mouse Transcript, Strain: reference (CL57BL6)) in the results.ENSMUST00000071500.13 consists of 26 exons.
- Click on Exons in the left-hand panel. The translation start is found in the first exon (ENSMUSE00000725552), shown in dark blue text.
The largest exon is the final exon (856 bp), which has the exon ID ENSMUSE00000773588. Exon 2 has a start and end phase of 0 and 1 respectively, which means that the codon at the start of the exon starts at the first nucleotide and the codon at the end of the exon ends at nucleotide 2. Notice that the end phase of each exon is the same as the start phase of the next exon.
- Click on Protein summary in the menu on the left hand side of the page. Alternating exons are shown on the protein as different shades of purple.
There are two predicted protein domains that fall within the second exon: a transmembrane helix and low complexity peptide sequence (Seg). You can click on the track names to find a description.
Click on a domain or feature to view further information.
The C-terminal Pfam domain is Peptidase_S9 (PF00326), which spans or partially spans seven exons, covering amino acid positions 582-787.
- Click on Domains & features.
Looking at the domains table you should notice that there are two domains associated with Pfam: PF00326 and PF00930.
Click on Display all genes with this domain next to IPR002469. This should now display the genes that have the IPR002469 domain located on the karyotype and as a table.
6 genes have this domain and they are found on chromosomes 1, 2, 5, 9 and 17.
Exploring the CCD7 gene in Arabidopsis thaliana
-
Find the Arabidopsis thaliana CCD7 gene on Ensembl Plants. On which chromosome and which strand of the genome is this gene located?
-
Where in the cell is the CCD7 protein located?
-
What is the source of the assigned gene name?
-
How many transcripts does it have? How long is its longest transcript (in bp)? How long is the protein it encodes? How many exons does it have? Are any of the exons completely or partially untranslated?
- Go to the Ensembl Plants homepage (http://plants.ensembl.org/). Select A. thaliana from the species list and type
CCD7
in the search box. Click Go and click on the gene ID AT2G44990. You can find the strand orientation and the location under Summary in the Gene tab.The A. thaliana CCD7 gene is located on chromosome 2 on the forward strand.
- Click on GO: Cellular component in the left-hand panel.
The protein is located in the chloroplast and plastid.
- Click on Summary in the side menu.
The gene name is assigned and imported from NCBI gene (formerly Entrezgene).
- Click on Show transcript table.
There are 3 transcripts. The longest one is 2005 bp and the length of the encoded protein is 622 amino acids.
Click on the transcript ID AT2G44990.3 in the transcript table. You can find the number of exons in under in the summary information at the top of the page.
It has 6 exons.
Click on Sequence: Exons in the left-hand panel.
The first and last exons are partially untranslated (sequence shown in orange). This can also been seen from the fact that in the transcript diagrams on the Gene Summary and Transcript Summary pages the boxes representing the first and last exon are partially unfilled.
Exploring a bacterial gene in Clostridium sporogenes
Start in Ensembl Bacteria and select the Clostridium sporogenes (GCA_001444695) genome.
-
What is the gene name for the Glutamine synthetase gene?
-
Go to the transcript tab. How long is the transcript? How long is the protein?
-
What domains can be found in the protein product of this transcript? How many different domain prediction methods agree with each of these domains?
- From the Ensembl Bacteria homepage, select Clostridium sporogenes by beginning to write the species name and selecting the species from the auto-complete list. Type
Glutamine synthetase
and click on the gene ID ENSB:yZtlLO8Ti90y75J which will open the Summary display on the Gene tab..The gene name is
glnA
. - Switch to the Transcript tab and go to the Summary display. You can find the length under Statistics underneath the transcript image.
The
glnA
transcript is 1,899 bp and the protein is 632 aa in length. - Click on either Protein Summary or Domains & features in the left hand menu to see graphically or as a table respectively.
6 protein domains were found. All of them predict a glutamine synthetase domain.
Exploring a gene in Escherichia coli
Start in Ensembl Bacteria and search for the Escherichia coli str. K-12 substr. MG1655 (GCA_000005845) genome.
-
What GO: biological process terms are associated with the Era gene?
-
How many different InterPro domains are found in the protein product of this gene?
-
What is the associated UniProt ID of the transcript?
Enter part of the name into the genome search box (e.g. MG1655
) and then select the correct genome to go to the species information page.
- Enter
Era
into the search box and hit Go. Click the link in the first hit to go to the era gene page. From here, click GO: Biological process in the left-hand menu.There are three GO IDs: GO:0000028, GO:0006468, GO:0042274 and GO:0046777.
- Switch to the Transcript tab and go to Domains & features in the left-hand panel. Count the number of unique InterPro IDs in the table.
8 different InterPro domains are found in the protein product of Era.
- You can find the UniProt ID in the transcript table or under General identifiers in the left-hand panel.
The UniProtKB/Swiss-Prot ID is P06616.