Ensembl TrainingEnsembl Home

<- Back to exercise page

Convert IDs using BioMart

BioMart is a very handy tool when you want to convert IDs from different databases. The following is a list of 30 gene IDs from the UniProtKB database:

Q4QY71,P52727,P40148,P29971,O42457,B9TQX2,A7YD35,A0A023I9E0,A0A0B4KJH3,A0A0B4KJK0,
A0A0B4KJW0,A0A0E3EKP4,A0A0F6MWY9,A0A0F6MX02,A0A0F6MX04,A0A0F6MX09,A0A0F6MX23,A0A0F6MX34,A0A0F6MX41,A0A0F6MX46,
A0A0F6MX65,A0A0F6MX78,A0A1D6X7G2,A0A2D1CGZ1,A0A2D1CGZ5,A0A2D1CH18,A0A2R2YUP1,S4SIT9,X2CQB0,A0A0F6MX77

Use BioMart in Ensembl to generate a list that shows to which Ensembl gene stable IDs and to which gene names these UniProtKB gene name IDs correspond. How many different transcripts do the 30 genes correspond to? Why do multiple transcripts correspond to a single UniProtKB gene ID?

  1. Go to BioMart. You can find a shortcut to the tool on any Ensembl page in the navigation bar at the top of the page. Click New in the top left-hand menu if you need to start a new query. Select the Ensembl Genes database. Choose the Gilthead seabream genes dataset.

  2. Click on Filters in the left panel. Expand the GENE section. Select Input external references ID list). Enter the list of IDs in the text box. Don’t forget to select the correct ID format from the drop-down menu. Select UniProtKB Gene Name ID (HINT: you may have to scroll down the drop-down menu to find this). Count the number of genes your filter applies to. This should be 30 / 27,714.

  3. Click on Attributes in the left panel. Select the Features attributes page. Expand the GENE section. Here, the Gene stable ID and Transcript stable ID are already selected by default. Keep those selection and also add Gene name. Next, expand the EXTERNAL section and select UniProtKB Gene Name ID. We are adding this to be able to match our output to our input.

  4. Click the Results button on the toolbar. Select View: All rows as HTML to view your results in a new tab, or export all results to a file directly to your local machine. You can open the table in Excel and count the number of unique transcript IDs in the Transcript stable ID column.

There are 46 unique transcript IDs.

Multiple transcripts can correspond to a single UniProtKB gene ID because a gene is made up of a set of transcripts: multiple transcripts can be transcribed from a single gene.