[Gaggle home | Firegoose home | contents]
The Scenario
A researcher undertakes a study of the physiological response in H. salinarum
to changes in oxygen level. She does a series
of 61 microarrays under conditions of varying oxygen concentration.
What can we learn from this data?
Part 1: (Optional) Finding and clustering differentially expressed genes
The first step of this analysis is to find genes whose expression changes
under conditions where the level of oxygen has been perturbed relative to a
reference condition. Then we use a clustering algorithm to group together genes
whose expression changes in similar ways.
This part of the analysis exercises the Gaggle and a few desktop tools, namely R,
MeV, and the DMV. The list of genes computed in Part 1 has been encoded in the
Gaggle microformat
and embedded in this page. The reader can skip directly to
Part 2 to analyze these genes using the Firegoose.
Start Gaggle tools
- Start the Gaggle Boss.
- Start the DMV.
- Start R. Connect to the Gaggle by typing the following at the R command prompt:
library(gaggle)
source("http://gaggle.systemsbiology.net/R/gaggleUtil.R")
gaggleInit()
(or just cut-n-paste!)
Load the microarray data into the DMV
- In the DMV, open the environmental folder on the left and select oxygen. A button
labeled with a red 61 should appear in the top of the window. Click the button to
load microarray data for the 61 oxygen conditions.
- The DMV should open two tabs: lambdas and log10 ratios.
For each condition we have two measurements per gene:
- log10 ratio: perturbed oxygen condition vs. a standard reference condition
- lambda statistic: a measure of significance
Broadcast ratios to R
- In the DMV, Make sure the log10 ratios tab is selected.
- Click the All button to select all rows in the log10 matrix.
- Click Update and select R in the drop-down list of geese.
- Broadcast the matrix to R by clicking the button marked M.
Normalize microarray data
- In R, assign the name ratios to the matrix we just broadcast.
- Note that on some platforms (Windows) the matrix ready message fails to appear.
Type "dim(ratios)" to verify the size of the matrix.
- In one step, we will normalize ratios to a mean of 0 and a standard deviation of 1 and
broadcast the normalized matrix back to the DMV. This is done to make comparing expression
profiles during clustering easier.
matrix ready, dimension 2400 x 61
> ratios <- getMatrix()
> dim(ratios)
[1] 2400 61
> broadcast(normalize(ratios), "log10_ratios_normalized")
- A new tab labeled "log10_ratios_normalized" should appear in the DMV.
Broadcast lambdas to R
- Select the lambdas tab in the DMV.
- Press All to select all rows.
- R should still be selected in the drop-down list of geese.
- Press M to broadcast the matrix to R.
Find differentially expressed genes
Out of the ~2400 unique genes in H. salinarum, we want to find those significantly
differentially expressed under our experimental conditions. To accomplish this, a
threshold can be applied in R to the lambda values.
- Assign the matrix just broadcast to a variable as before. This time call it lambdas.
- Again, verify the dimensions of the matrix.
- The filterMatrix function returns a list of row names where the data in the row passes a threshold.
Here we require that 6 conditions have a lambda value of at least 50.
The FALSE parameter indicates
that we don't care whether the 6 conditions are consecutive. The choice of these parameters is
somewhat arbitrary and some judgement must be applied.
- broadcast these genes back to the DMV.
matrix ready, dimension 2400 x 61
> lambdas <- getMatrix()
> dim(lambdas)
[1] 2400 61
> sig_genes <- filterMatrix(lambdas, 50, 6, FALSE)
> sig_genes
[1] "VNG0013C" "VNG0014C" "VNG0017H" "VNG0018H" "VNG0022H" "VNG0027H"
[7] "VNG0028C" "VNG0029H" "VNG0033H" "VNG0043H" "VNG0049H" "VNG0053H"
...
[445] "VNG6432H" "VNG6439H" "VNG6441H"
> broadcast(sig_genes)
- In the log10_ratios_normalized tab of the DMV, the 447 rows that passed the filter
should be selected.
Cluster submatrix using MeV
We use clustering to group together genes with similar expression profiles.
- Start MeV (multiexperiment viewer).
- In DMV, make sure the log10_ratios_normalized tab is selected and the 447 rows from the previous step
are selected. Click M to broadcast the submatrix to MeV.
- A green and red heat-map should appear in MeV.
- Press the PCA button and click "OK" to cluster the expression profiles using the priciple component algorithm. PCA
is a technique that projects data of high dimesionality onto fewer axes.
- Open the PCA genes folder (left of screen) and open the subfolders Projections on the PC axes,
Components 1,2,3, and 2D Views. The axis that corresponds best to response to oxygen is axis 1. Click
on "1,2" to plot the genes against these axes.
- Use the mouse to draw an ellipse around the genes to the left of the Y-axis. Select as many as possible
without crossing the Y-axis.

Highlight the selected cluster in the DMV
- Return momentarily to the DMV. Press "Clear" to clear the current selections.
- Broadcasting the clustered genes is slightly awkward. Back in MeV, right-click on "1,2" (on the left) and select
Launch new session. A heat-map will appear containing just the genes in the selected cluster.
- Open the Gaggle menu and click "Broadcast" to broadcast the gene names.
- In the DMV, 222 genes should be selected (give or take a few depending on how generous an ellipse you drew).
A similar procedure can be used to select the genes activated by oxygen, which lie on the right side of the Y-axis.
Resulting gene clusters
The results of this part of the analysis are a pair of gene clusters broadly classified into those
whose expression is correlated with oxygen levels and those whose expression is anticorrelated with
oxygen. Cluster 1 holds the genes induced by the absence of oxygen. Cluster 2 holds genes induced
by the presence oxygen.
Cluster 1: Anaerobic genes
These 222 genes were found to be genes activated under anaerobic conditions.
[+] Show/hide cluster 1.
name=cluster 1 anaerobic genes
size=222
species=Halobacterium sp. NRC-1
Cluster 2: Aerobic genes
These 223 genes were found to be genes repressed under anaerobic conditions and active under aerobic conditions.
[+] Show/hide cluster 2.
name=cluster 2 aerobic genes
size=223
species=Halobacterium sp. NRC-1
[+] Show/hide Anaerobic genes as GI numbers.
name=Anaerobic genes as GI numbers
size=219
species=Halobacterium sp. NRC-1

validate