Ensembl TrainingEnsembl Home

<- Back to exercise page

Get genes by protein domain

Retrieve the protein sequences (in FASTA format) of all wheat genes that have an NCBI Gene ID, that are protein coding and with Transmembrane helices.

Do a count after selection of each filter to check the number of genes remaining in your dataset.

Export the results of the sequences and select Gene description and Source of gene name as headers.

Click the New button on the toolbar, choose the Ensembl Plants Genes database and Triticum aestivum genes dataset.

Now, filter for the genes with NCBI Gene ID only:

Click on Filters in the left panel, expand the GENE section by clicking on the + box. Select with NCBI Gene ID under Limit to genes …. Make sure the box in front of the filter is ticked, otherwise the filter won’t work.

Now click the Count button on the toolbar.

This will give you 92 Genes.

Now filter further for genes that are protein-coding by selecting Gene type – protein_coding and click again on Count.

This now gives you 92 Genes.

Finally, filter for genes that have a signal peptide domains. Expand the PROTEIN DOMAINS section by clicking on the + box. Select Transmembrane helices – Only.

There are 79 genes on the bread wheat genome that contain NCBI Gene IDs and protein coding with signal domains.

Click on Sequences, then Protein. Select the appropriate header information: Gene description and Source of gene name.

Click on Results and the sequence will be shown as FASTA format.