Mapping microarray probes to genes

The following is a list of 101 probes that were upregulated after short-term phosphate deprivation of Arabidopsis thaliana. The microarray used was the Arabidopsis thaliana whole genome Affymetrix gene chip (ATH1) (Misson et al. A genome-wide transcriptional analysis using Arabidopsis thaliana Affymetrix gene chips determined plant responses to phosphate deprivation. Proc Natl Acad Sci U S A. 2005 August 16; 102(33): 11934–11939).

259842_at, 251193_at, 259303_at, 252534_at, 266957_at, 257891_at, 263593_at, 266372_at, 265342_at, 254011_at, 260623_at, 262238_at, 264118_at, 256910_at, 263846_at, 249996_at, 248094_at, 267361_at, 246275_at, 258034_at, 248622_at, 263483_at, 254250_at, 257964_at, 248566_s_at, 245263_at, 264636_at, 264342_at, 254125_at, 262369_at, 259399_at, 251770_at, 266132_at, 246001_at, 246075_at, 258887_at, 258856_at, 263391_at, 256376_s_at, 266766_at, 258277_at, 266142_at, 246071_at, 261021_at, 251143_at, 252730_at,249337_at, 258158_at, 245882_at, 250054_at, 263539_at, 263851_at, 247949_at, 262229_at, 246777_at, 258975_at, 247026_at, 252265_at, 256100_at, 246099_at, 246302_at, 254111_at, 256017_at, 259750_at, 254215_at, 253271_s_at, 247314_at, 267567_at, 250435_at, 255543_at, 259479_at, 264783_at, 245193_at, 260561_at, 263948_at, 258682_at, 253386_at, 263847_at, 266017_at, 252414_at, 255360_at, 251176_at, 266743_at, 253829_at, 267497_at, 258613_at, 253163_at, 261648_at, 258100_at, 249983_at, 266413_at, 264261_at, 256627_at, 249640_at, 248164_at, 266184_s_at, 247047_at, 263083_at, 251961_at, 252011_at, 260101_at

(a) Generate a list of the genes to which these probesets map. Include the Ensembl Gene ID, name, description and probe ID attributes.

(b) As a first step in order to be able to analyse them for possible regulatory features they have in common, retrieve the 250 bp upstream of the transcripts of these genes. Include the Ensembl Gene ID, name and description attributes in the sequence header.

(a) Click the New button on the toolbar. Choose the Ensembl Plants Genes database. Choose the Arabidopsis thaliana genes dataset.

Click on Filters in the left panel. Expand the GENE section. Select Input microarray probes/probesets ID list – Affymetrix array Arabidopsis ATH1 121501 ID(s). Enter the list of probeset IDs in the text box (either comma separated or as a list). Click the Count button on the toolbar.

Click on Attributes in the left panel. Expand the GENE section. Deselect Transcript Stable ID. Select Gene name and Gene description. Expand the EXTERNAL section. Select Affymetrix array Arabidopsis ATH1 121501.

Click the Results button on the toolbar. Select View All rows as HTML or export all results to a file. Tick the box Unique results only.

Your results should show 104 / 32833 genes. Apparently there are a few probes that have been mapped to more than one gene.

(b) You can leave the dataset and filters the same, so you can directly specify the attributes:

Click on Attributes in the left panel. Select the Sequences attributes page. Expand the SEQUENCES section. Select Flank (Transcript). Enter 250 in the Upstream flank text box. Expand the HEADER INFORMATION section. Select, in addition to the default selected attributes, Gene name and Gene description.

Note: Flank (Transcript) will give the flanks for all the transcripts of a gene with multiple transcripts. Flank (Gene) will give the flank for the transcript with the outermost 5’ (or 3’) end.

Click the Results button on the toolbar. Select View All rows as FASTA or export all results to a file.