Genes and transcripts in Ensembl Bacteria, Demo
Demo: Viewing genes and transcripts
You can find out lots of information about Ensembl genes and transcripts using the browser. If you’re already looking at a Region in detail view, you can click on any transcript and a pop-up menu will appear, allowing you to jump directly to that gene or transcript.
Alternatively, you can find a gene by searching for it. You can search for gene names, identifiers, or functions that might be associated with the genes.
We’re going to look at the lacZ gene Escherichia coli str. K-12 substr. MG1655 (GCA_000005845). From bacteria.ensembl.org, search for the Escherichia coli_ str. K-12 substr. MG1655 (GCA_000005845)
genome. Type lacZ
into the species-specific search bar and click the Go button.
The gene tab
Click on the gene ID b0344 from the search hits. The Gene tab should open:
This page summarises the gene, including its location, name and equivalents in other databases. At the bottom of the page, a graphic shows a Region in detail view with the transcripts. We can also see the overlapping and neighbouring genes.
There are different tabs for different types of features, such as genes and transcripts. These appear side-by-side underneath the species name at the top of the page, allowing you to jump back and forth between features of interest. Each tab has its own navigation column down the left hand-side of the page, listing all the things you can see for this feature.
Gene sequence
Let’s walk through the menu for the Gene tab. Click Sequence in the left-hand panel to view the genomic sequence of the gene.
The sequence is shown in FASTA format. The FASTA header contains the genome assembly, chromosome, coordinates and strand (1 or -1). This gene is on the positive strand.
Exons are highlighted within the genomic sequence: the exon of our gene of interest and any neighbouring or overlapping genes. By default, 600 bases are shown up and downstream of the gene. We can make changes to how this sequence appears with the Configure this page button found at the left. This allows us to change the flanking regions, add line numbering and more. Click on it now.
We have changed our Flanking sequences to 200 and added Line numbering relative to the coordinate system. Save your setting by clicking the check icon at the top right-hand corner.
You can download this sequence by clicking in the Download sequence button above the sequence. This will open a dialogue box that allows you to pick between plain FASTA sequence, or sequence in rich-text format (RTF), which includes all the coloured annotations and can be opened in a word processor. If you want run a sequence analysis tool, download as FASTA sequence, whereas if you want to analyse the sequence visually, RTF is best for this. This button is available for all sequence views.
Gene function
To find out the protein function, have a look at gene ontology (GO) terms from the Gene Ontology consortium. There are three pages of GO terms, representing the three divisions: GO: Biological process (what the protein does)
GO: Cellular component (where the protein is)
GO: Molecular function (how it does it)
Click on GO: Biological process to see an example of the GO pages.
Here you can see the functions that have been associated with the gene. There are three-letter codes that indicate how the association was made, as well as links to the specific transcript they are linked to.
Gene information in external databases
We also have links out to other databases which have information about our genes and may focus on other topics that we don’t cover, like the European Nucleotide Archive ENA) or the UniProt knowledge base UniProtKB. Go up the left-hand menu to External references:
The transcript tab
We’re now going to explore the transcript of lacZ. Click on Show transcript table underneath the gene summary at the top of the page.
Here we can see a list of all the transcripts of lacZ with their identifiers, lengths and biotypes. The lacZ gene only has one transcript. Click on the transcript ID AAC73447.
You are now in the Transcript tab for AAC73447. We can still see the gene tab so we can easily jump back. The left hand navigation column provides several options for the transcript AAC73447 - many of these are similar to the options you see in the gene tab, but not all of them. If you can’t find the thing you’re looking for, often the solution is to switch tabs.
Transcript sequences
Click on the Exons link in the left-hand panel. This page is useful as it will give you the length of the coding sequence.
You may want to change the display (for example, to show more flanking sequences). In order to do so, click on Configure this page and change the display options accordingly.
Transcript information in external databases
Next, follow the General identifiers link at the left. Just like the External References page in the Gene tab, this page shows links out to other databases such as InterPro, PDB, UniProtKB, and others, this time linked to the transcript or protein product, rather than the gene.
Protein domain information
If you’re interested in protein domains, you could click on Protein summary to view domains from different sources, such as SMART and PROSITE. These are all plotted against the transcript sequence.
Alternatively, you can go to Domains & features to see a table of the same information in a tabular format.