Ensembl TrainingEnsembl Home

Ensembl Browser Workshop - EuroFAANG AQUA-FAANG workshop: methods to use and reuse the AQUA-FAANG data and Ensembl resources to advance science

Course Details

Lead Trainer
Aleena Mushtaq
Event Dates
2023-04-17 until 2023-04-18
  Virtual: EMBL-EBI, Hinxton
Work with the Ensembl Outreach team to get to grips with the Ensembl browser, accessing and analysing genomic data from fish species represeted in the AQUA-FAANG project.

Demos and exercises

Ensembl species

The front page of Ensembl is found at ensembl.org. It contains lots of information and links to help you navigate Ensembl:

At the top left you can see the current release number and what has come out in this release. To access old releases, scroll to the bottom of the page and click on View in archive site.

Click on the links to go to the archives. Alternatively, you can jump quickly to the correct release by putting it into the URL, for example e105.ensembl.org jumps to release 105.

Click on View full list of all species.

Click on the common name of your species of interest to go to the species homepage. We’ll click on Atlantic Salmon.

Here you can see links to example pages and to download flatfiles. To find out more about the genome assembly and genebuild, click on More information and statistics.

Here you’ll find a detailed description of how to the genome was produced and links to the original source. You will also see details of how the genes were annotated.

Turbot genome assembly

  1. Go to the species homepage for Turbot. What is the name of the genome assembly for Turbot?

  2. How long is the Turbot genome (in bp)? How many genes have been annotated?

  1. Select Turbot from the drop down species list, or click on View full list of all Ensembl species, then choose Turbot from the list.
    The assembly is ASM1334776v1 or GCA_013347765.1.
  2. Click on More information and statistics. Statistics are shown in the tables on the left.
    The length of the genome is 556,696,898 bp.
    There are 21,263 coding genes.

Available European seabass assemblies

What previous assemblies are available for European seabass?

Navigate to the Ensembl Homepage (www.ensembl.org). Navigate to the European seabass species homepage by clicking on ‘View full list of all species’, filtering the table of species by searching for European seabass and clicking on the species name within the table. Under Other assemblies one previous assembly and the release you can find it in is listed.

Assembly seabass_V1.0 is available in the archived Ensembl 105 release.

Exploring Genomic Regions

Start at the Ensembl front page, ensembl.org. You can search for a region by selecting your species of interest form the drop-down menu and typing the coordinates into the search box.

To search for a genomic region, you need to input your region coordinates in the correct format, which is chromosome, colon, start coordinate, dash, end coordinate, with no spaces for example: 25:4274500-4302000. Type (or copy and paste) these coordinates into the search box.

Press Enter or click Go to jump directly to the Region in detail Page.

Click on the button to view page-specific help. The help pages provide text, labelled images and, in some cases, help videos to describe what you can see on the page and how to interact with it.

The Region in detail page is made up of three images, let’s look at each one in detail.

The first image shows the chromosome:

The region we’re looking at is highlighted on the chromosome. You can jump to a different region by dragging out a box in this image. Drag out a box on the chromosome, a pop-up menu will appear.

If you wanted to move to the region, you could click on Jump to region (### bp). If you wanted to highlight it, click on Mark region (###bp). For now, we’ll close the pop-up by clicking on the X on the corner.

The second image shows a 1Mb region around our selected region. This is always 1Mb in human, but the fixed size of this view varies between species. This view allows you to scroll back and forth along the chromosome.

You can also drag out and jump to or mark a region.

Click on the X to close the pop-up menu.

Click on the Drag/Select button to change the action of your mouse click. Now you can scroll along the chromosome by clicking and dragging within the image. As you do this you’ll see the image below grey out and two blue buttons appear. Clicking on Update this image would jump the lower image to the region central to the scrollable image. We want to go back to where we started, so we’ll click on Reset scrollable image.

The third image is a detailed, configurable view of the region.

Here you can see various tracks, which is what we call a data type that you can plot against the genome. Some tracks, such as the transcripts, can be on the forward or reverse strand. Forward stranded features are shown above the blue contig track that runs across the middle of the image, with reverse stranded features below the contig. Other tracks, such as variants, regulatory features or conserved regions, refer to both strands of the genome, and these are shown by default at the very top or very bottom of the view.

You can use click and drag to either navigate around the region or highlight regions of interest, Click on the Drag/Select option at the top or bottom right to switch mouse action. On Drag, you can click and drag left or right to move along the genome, the page will reload when you drop the mouse button. On Select you can drag out a box to highlight or zoom in on a region of interest.

With the tool set to Select, drag out a box around an exon and choose Mark region.

The highlight will remain in place if you zoom in and out or move around the region. This allows you to keep track of regions or features of interest.

We can edit what we see on this page by clicking on the blue Configure this page menu at the left.

This will open a menu that allows you to change the image.

There are thousands of possible tracks that you can add. When you launch the view, you will see all the tracks that are currently turned on with their names on the left and an info icon on the right, which you can click on to expand the description of the track. Turn them on or off, or change the track style by clicking on the box next to the name. More details about the different track styles are in this FAQ: http://www.ensembl.org/Help/Faq?id=335.

You can find more tracks to add by either exploring the categories on the left, or using the Find a track option at the top left. Type in a word or phrase to find tracks with it in the track name or description.

Let’s add some tracks to this image. Add:

  • Repeats (Teleost)
  • RNASeq models - Heart - BAM files (Unlimited) and Gene models (Expanded with labels)

Now click on the tick in the top left hand to save and close the menu. Alternatively, click anywhere outside of the menu. We can now see the tracks in the image. The proteins track is stranded, so you will see two tracks, one above and one below the contig, representing the proteins mapped to the forward and reverse strands respectively. The variants track is not stranded, so is found near the bottom of the image.

If the track is not giving you can information you need, you can easily change the way the tracks appear by hovering over the track name then the cog wheel to open a menu. To make it easier to compare information between tracks, such as spotting overlaps, you can move tracks around by clicking and dragging on the bar to the left of the track name.

Now that you’ve got the view how you want it, you might like to show something you’ve found to a colleague or collaborator. Click on the Share this page button to generate a link. Email the link to someone else, so that they can see the same view as you, including all the tracks you’ve added. These links contain the Ensembl release number, so if a new release or even assembly comes out, your link will just take you to the archive site for the release it was made on.

To return this to the default view, go to Configure this page and select Reset configuration at the bottom of the menu.

Exploring a genomic region in Rainbow trout

  1. Go to the region from 49,000,000 to 49,400,000 bp on Rainbow trout chromosome 3.

  2. Zoom in on the myo3b gene.

  3. Configure this page to turn on the CpG islands track in this view. What tool was used to annotate the CpG islands according to the track information? How many CpG islands can you see within the myo3b gene?

  4. Create a Share link for this display. Email it to your neighbour. Open the link they sent you and compare. If there are differences, can you work out why?

  5. Export the genomic sequence of the region you are looking at in FASTA format.

  6. Turn off all tracks you added to the Region in detail page.

  1. Go to the Ensembl homepage. Select Rainbow trout from the Species drop-down list and type 3:49000000-49400000 in the text box. Click Go.

  2. Draw with your mouse a box encompassing the myo3b transcripts. Click on Jump to region in the pop-up menu.

  3. Click Configure this page in the side menu (or on the cog wheel icon in the top left hand side of the bottom image). Go into Simple features in the left-hand menu then select CpG islands. Click on the (i) button to find out more information.

    The CpG islands are determined from the genomic sequence using a program written by G. Micklem, similar to newcpgreport in the EMBOSS package. Save and close the new configuration by clicking on ✓ (or anywhere outside the pop-up window). There is one CpG island overlapping myo3b.

  4. Click Share this page in the side menu. Copy the URL. Get your neighbour’s email address and compose an email to them, paste the link in and send the message. When you receive the link from them, open the email and click on your link. You should be able to view the page with the new configuration and data tracks they have added to in the Location tab. You might see differences where they specified a slightly different region to you, or where they have added different tracks.

  5. Click Export data in the side menu. Leave the default parameters as they are (FASTA sequence should already be selected). Click Next>. Click on Text. Note that the sequence has a header that provides information about the genome assembly (USDA_OmykA_1.1), the chromosome, the start and end coordinates and the strand. For example:
    >3 dna:primary_assembly primary_assembly:USDA_OmykA_1.1:3:49195318:49199101:1

  6. Click Configure this page in the side menu. Click Reset configuration. Click ✓.

Exploring a genomic region in Gilthead seabream

  1. Go to the region from 9,205,000 to 9,554,000 bp on Gilthead seabream chromosome 16.

  2. Zoom in on the slc25a21 gene.

  3. Configure this page to turn on the RefSeq GFF3 annotation import track in this view. What are the differences between the slc25a1 transcripts annotated by Ensembl and NCBI RefSeq?

  1. Go to the Ensembl homepage. Select Gilthead seabream from the Species drop-down list and type 16:9205000-9554000 in the text box. Click Go.

  2. Draw with your mouse a box encompassing the slc25a21 transcripts. Click on Jump to region in the pop-up menu.

  3. Click Configure this page in the side menu (or on the cog wheel icon in the top left hand side of the bottom image). Go into Genes in the left-hand menu then select RefSeq GFF3 annotation import. Save and close the new configuration by clicking on ✓ (or anywhere outside the pop-up window).

    There are two slc25a21 transcripts annotated by NCBI RefSeq, but three transcripts annotated by Ensembl. The coding sequences of the two transcripts annotated by NCBI RefSeq are the same as the slc25a21-202 transcript (ENSSAUT00010071831.1), with differences in the length of the 5’ and 3’ UTRs. slc25a21-201 (ENSSAUT00010071830.1) has a different exon structure for its final exon, with an extended coding sequence and 3’ UTR.

Genes and Transcripts

Demo: The gene tab

If you click on any one of the transcripts in the Region in detail image, a pop-up menu will appear, allowing you to jump directly to that gene or transcript.

Another way to go to a gene of interest is to search directly for it.

We’re going to look at the Atlantic salmon espn gene. This gene encodes a multifunctional actin-bundling protein with a major role in mediating sensory transduction in various mechanosensory and chemosensory cells.

From ensembl.org, type espn into the search bar and click the Go button. You will get a list of hits with the human gene at the top.

Where you search for something without specifying the species, or where the ID is not restricted to a single species, the most popular species will appear first, in this case, human, mouse and zebrafish appear first. You can restrict your query to species or features of interest using the options on the left.

Click on the gene name of the Atlantic salmon. The Gene tab should open:

Let’s walk through some of the links in the left hand navigation column. How can we view the genomic sequence? Click Sequence at the left of the page.

The sequence is shown in FASTA format. Take a look at the FASTA header:

Exons are highlighted within the genomic sequence. Variants can be added with the Configure this page link found at the left. Click on it now.

Once you have selected changes (in this example, Show variants and Line numbering: Relative to this sequence) click at the top right.

You can download this sequence by clicking in the Download sequence button above the sequence:

This will open a dialogue box that allows you to pick between plain FASTA sequence, or sequence in RTF, which includes all the coloured annotations and can be opened in a word processor. This button is available for all sequence views.

Can our gene be found in other databases? Go up the left-hand menu to External references:

This contains links to the gene in other projects, such as NCBI Gene, and papers where this sequence is published.

To find out what the protein does, click on GO:biological process to see GO terms from the Gene Ontology consortium.

Demo: The transcript tab

Let’s now explore one splice isoform. Click on Show transcript table at the top.

Click on the ID for espn-202, ENSSSAT00000016439.2.

You are now in the Transcript tab for espn-202. The left hand navigation column provides several options for the transcript espn-202.

For detailed information on the support for this transcript, click on Supporting evidence

Click on the identifiers of the evidence to get a pop-up. This links out to the original records of these data in, for example, RefSeq, Uniprot or ENA.

Click on the Exons link.

You may want to change the display (for example, to show more flanking sequence, or to show full introns). In order to do so click on Configure this page and change the display options accordingly.

Now click on the cdna link to see the spliced transcript sequence.

UnTranslated Regions (UTRs) are highlighted in dark yellow, codons are highlighted in light yellow, and exon sequence is shown in black or blue letters to show exon divides.

Next, follow the General identifiers link at the left.

This page shows information from other databases such as RefSeq, UniProtKB and others, that match to the Ensembl transcript and protein.

Now click on Protein summary to view domains from Pfam, PROSITE, Superfamily, Panther, and more.

Clicking on Domains & features shows a table of this information.

Exploring the Common carp prkci gene

Search for the Common carp gene prkci.

(a) What GO: biological process terms are associated with the prkci gene?

(b) Go to the transcript tab for the prkci-207 (ENSCCRT00000153471.1) protein coding transcript. How many exons does it have? Which one is the longest? How much of that is coding?

(c) How many different domain prediction methods predict a PB1 domain? Where in the protein is this domain?

(a) Go to the Ensembl homepage (http://www.ensembl.org).

Select Search: Common carp and type prkci. Click Go. Click on the gene link ENSCCRG00000007495.2.

Click on GO: biological process in the side menu. Protein phosphorylation and establishment or maintenance of cell polarity are listed as GO: Biological processes associated with this gene.

(b) Click on the transcript prkci-207 (ENSCCRT00000153471.1). Click on Exons in the left hand menu. There are eighteen exons. Exon 18 is longest with 1,572bp, of which around 88 are coding.

(c) Click on either Protein Summary or Domains & features in the left hand menu to see graphically or as a table respectively. The PB1 domain is predicted by SuperFamily, SMART, Pfam and PROSITE. It is towards the N-terminus of the protein. Clicking on the predicted PB1 domain will open a pop-up window that shows you the amino acid coordinates for the predicted domain.

Exploring the Turbot pcsk2 gene

(a) Find the Turbot pcsk2 (proprotein convertase subtilisin/kexin type 2) gene, and go to the Gene tab.

  • On which chromosome and which strand of the genome is this gene located?
  • How many transcripts (splice variants) are there and how many are protein coding?
  • How long is the protein encoded by ENSSMAT00000034168.2?
  • What are some functions of pcsk2 according to the Gene Ontology consortium?

(b) In the transcript table, click on the transcript ID for pcsk2-201, and go to the Transcript tab.

  • How many exons does it have?
  • Are any of the exons completely or partially untranslated?

(c) Is there any supporting evidence present for ENSSMAT00000034168.2?

(a) Go to the Ensembl homepage.

Select Search: Turbot and type pcsk2. Click Go.

Click on the Ensembl ID ENSSMAG00000020650.

  • Chromosome 15 on the reverse strand.
  • Ensembl has five protein coding transcripts annotated for this gene.
  • It codes for a protein of 629 amino acids

Click on GO:molecular function to see that serine-type endopeptidase activity and serine-type peptidase activity are associated with pcsk2.

(b) Click on ENSSMAT00000034168.2.

It has thirteen exons. This is shown in the Transcript summary or in the left hand side menu Exons.

Click on the Exons link in this side menu.

Exons 1 and 13 are partially untranslated (UTR sequence is shown in orange). You can also see this in the cDNA view if you click on the cDNA link in the left side menu.

(c) Click on Supporting evidence in the left menu.

The image shows Transcript supporting evidence which comes from alignments used to build the transcript model and Exon supporting evidence which is alignments from the Ensembl pipeline that support the exons.


In any of the sequence views shown in the Gene and Transcript tabs, you can view variants on the sequence. You can do this by clicking on Configure this page from any of these views.

Let’s take a look at the Gene sequence view for smc3 in Atlantic Salmon. Search for smc3 and go to the Sequence view.

If you can’t see variants marked on this view, click on Configure this page and select Show variants: Yes and show links.

Find out more about a variant by clicking on it.

You can go to the Variation tab by clicking on the variant ID. For now, we’ll explore more ways of finding variants.

To view all the sequence variations in table form, click the Variation table link at the left of the gene tab.

You can filter the table to only show the variants you’re interested in. For example, click on Type: All, then select the variant consequences you’re interested in.

The table contains lots of information about the variants. You can click on the IDs here to go to the Variation tab too.

Let’s have a look at variants in the Location tab. Click on the Location tab in the top bar.

Click on Configure this page and open Variation from the left-hand menu.

There is one track turned on by default for Atlantic_salmon_EVA_PRJEB34225

Click on a variant to find out more information. It may be easier to see the individual variants if you zoom in.

Let’s have a look at a specific variant. If we zoomed in we could see the variant 12:22221527:T_C:PRJEB34225 in this region, however it’s easier to find if we put 12:22221527:T_C:PRJEB34225 into the search box. Click through to open the Variation tab.

The icons show you what information is available for this variant. Click on Genes and regulation, or follow the link at the left.

This variant is found in six transcripts of the smc3 (ENSSSAG00000007113) gene. It has not been associated with any regulatory features or motifs.

Let’s look at population genetics. Either click on Explore this variant in the left hand menu then click on the Population genetics icon, or click on Population genetics in the left-hand menu. We can see information here on Populations from the EVA study PRJEB34225.

Exploring a SNP in Atlantic salmon

The missense variant 25:3426821:C_A:PRJEB34225 is found in the Atlantic salmon hs6st2 gene.

(a) Find the page with information for 25:3426821:C_A:PRJEB34225.

(b) Is 25:3426821:C_A:PRJEB34225 a missense variation in all transcripts of the hs6st2 gene?

(c) What is the major allele in 25:3426821:C_A:PRJEB34225?

(a) Please note there is more than one way to get this answer. Either go to the Variant Table for the Atlantic salmon hs6st2 gene, and filter variants to the missense variants, or search Ensembl for 25:3426821:C_A:PRJEB34225 directly.

(b) Once you’re in the Variation tab, click on the Genes and regulation link or icon.

This SNP is found in four transcripts from two genes. It is a missense variant in hs6st2 gene and an intron variant, and downstream gene variant in ENSSSAG00000118991.

(c) Select Population genetics from the side menu.

From the Frequency data table, the PRJEB34225 allele frequencies shows that C is the major allele (95% of all population) compared to A (5% of all population).

Exploring sequence variant annotation in Atlantic salmon

(a) Find the ox2g gene for Atlantic salmon and go to the Variation table page. What is the variation name for D374H variant from EVA study PRJEB34225?

(b) Is this variant a missense variant (a Sequence Ontology term) for all transcripts that have been annotated for the ox2g gene?

(c) Why does Ensembl put the G allele first (G/C)?

(d) How many sample genotypes are available for this variant? Do all samples have the same genotype?

(a) Go to the Ensembl species page for Atlantic salmon and search for ox2g.

In the Variation Table, type e.g. 374 and/or D/H in the Filter text box.

The variation name for D374H is 21:28064636:G_C:PRJEB34225.

(b) Click on 21:28064636:G_C:PRJEB34225.

No, 21:28064636:G_C:PRJEB34225 is missense for three ox2g transcripts (ENSSSAT00000055888.2, ENSSSAT00000055982.2, ENSSSAT00000214738.1). It has the intron variant consequence for two ox2g transcripts (ENSSSAT00000195834.1 and ENSSSAT00000055943.2). Note: ox2g has five transcripts (https://feb2023.archive.ensembl.org/Salmo_salar/Gene/Variation_Gene/Table?db=core;g=ENSSSAG00000039791;r=21:27997093-28104998;source=PRJEB34225;v=21:28064636:G_C:PRJEB34225;vdb=variation;vf=21:28064636:G_C:PRJEB34225).

(c) In Ensembl, the allele that is present in the reference genome assembly is put first, i.e. G. In the literature normally the major allele (in the population of interest) is put first. In the case of 21:28064636:G_C:PRJEB34225 the allele in the reference genome is the major allele in all populations studied (Gaspe of New Brunswick, Penobscot River, St. John River).

(d) Click on Sample genotypes on the left menu.

This variant has 80 sample genotypes. It has 16 homozygous G/G genotypes in Gaspe of New Brunswick, 11 homozygous G/G in Penobscot River and in St. John River population, two of the 53 samples have heterozygous G/C genotype and the rest have homozygous G/G genotype. This information comes from EVA study PRJEB34225.


We have identified four variants in salmon:
12:22217409:G_C:PRJEB34225, 12:22215848:G_C:PRJEB34225, 12:22216033:C_G:PRJEB34225, 12:22215848:G_C:PRJEB34225

We will use the Ensembl VEP to determine:

  • If the variants have been annotated in Ensembl already
  • If genes are affected by the variants

Go to the front page of Ensembl and click on Variant Effect Predictor in the Tools section or click on VEP in the top header.

This page contains information about the VEP, including a link for downloading the script version of the tool. Click on the Launch VEP button to open the input form.

Lets input the variants data in VCF format:
Chromosome Position Name Reference Alternative

Put the following into the Input data box:

12  22217409  .  G  C
12  22215848  .  G  C
12  22216033  .  C  G
12  22215848  .  G  C

The VEP will detect automatically that the data is in VCF format.

There are further options that you can choose for your output. These are categorised as Identifiers, Variants and frequency data, Additional annotation, Predictions, Filtering options and Advanced options. Let’s open all menus and take a look.

Hover over the options to see definitions.

When you have selected everything you need, scroll right to the bottom and click Run.

The display will show you the status of your job. It will say Queued, then automatically switch to Done when the job is done, you do not need to refresh the page. You can save, edit, share or delete your job at this time. If you have submitted multiple jobs, they will all appear here.

Click on View Results once your job is done.

In your results you will see a graphical and table summary of the data as well as a table with the detailed results.

VEP cdk5r1b Atlantic salmon

We have identified a few variants in Atlantic salmon (Salmo salar):

  • chr 28, genomic coordinate 1777645, alleles C/T
  • chr 28, genomic coordinate 1777906, alleles G/A
  • chr 28, genomic coordinate 1786995, alleles T/G

(a) Which genes and transcripts do these variants map to?

(b) Do these variants result in a change in the proteins encoded by any of the Ensembl genes? Which genes?

Go to www.ensembl.org and click on the Variant Effect Predictor link on the homepage. Click Launch VEP.

Choose Atlantic salmon as the species and copy the following into the Paste data text box:

28 1777645 1777645 C/T var1 28 1777906 1777906 G/A var2 28 1786995 1786995 T/G var3

Note: Variation data input can be done in a variety of formats. See more details here http://www.ensembl.org/info/docs/variation/vep/vep_formats.html

Click Run.

When your job is listed as Done, click View Results.

You will get a table with the consequence terms from the Sequence Ontology project (http://www.sequenceontology.org/) (i.e. synonymous, missense, downstream, intronic, 5’ UTR, 3’ UTR, etc) provided by VEP for the listed SNPs. You can also upload the VEP results as a track and view them on Location pages in Ensembl.

The variants overlaps three genes (six transcripts of psmd11b, four transcripts of cdk5r1b and one transcript of ENSSSAG00000096896 gene)

Variant 28_1777906_G/A overlaps cdk5r1b gene and resulted in amino acid change at position 109 and 116 (Ser to Leu), variant 28_1777645_C/T also overlaps cdk5r1b gene and resulted in amino acid change at position 96 and 203 (Arg to His).

VEP analysis of variant in Atlantic salmon

You have performed sequencing and variant-calling experiments for Atlantic salmon. You have a few variants in the VCF format from this experiment:

25 4297825 . G A
25 4293985 . C G
25 4294047 . G T 25 4294047 . G T 25 4270019 . G A

(a) How many variants were analysed? How many are novel?

(b) How many genes and transcripts are affected by these variants?

(c) Do any of the variants have different consequences for different transcripts?

(d) Can you export all the results to a VCF file?

Go to www.ensembl.org and click on the Variant Effect Predictor link on the homepage. Click Launch VEP.

Choose Atlantic salmon as the species and enter the five variants from the exercise.

Note: Variation data input can be done in a variety of formats. See more details here http://www.ensembl.org/info/docs/variation/vep/vep_formats.html

Click Run.

When your job is listed as Done, click View Results.

(a) Five variants were analysed, none of these variants are novel.

(b) Only one gene (cdc16) is affected by these variants. It has ten transcripts, all of which are affected.

(c) Yes. These variants have the missense_variant, intron_variant and downstream_gene_variant consequences for the different transcripts of cdc16 gene.

(d) At the top right of the table there is an option to download data. Click on VCF for the All option. Open the VCF file you have downloaded in a text editor. You can see that VEP adds annotation in the INFO column of the VCF file.