Analyzing gene expression time-courses
Alexander Schliep, Ivan G. Costa, Christine Steinhoff, Alexander Schönhuth
Software
All experiments with HMMs estimations were run with GQL. The GQL-software, licensed under the GPL, is available at http://ghmm.org/gql. It requires a couple of other packages, most importantly Python and the GHMM. Splines and CAGED implementations were obtained from the authors, and a implementation of k-means in python was obtained at Open Source clustering . The python scripts used for generating all experiments are also available for download (see readme.txt for file descriptions).
Data
All data sets used in the experiments can be downloaded from (see http://ghmm.org/gql for format description): https://schlieplab.org/Static/Supplements/ExpAna/data/
Simulated Data 1
The simulated data set SIM 1 consists of a total of 3500 time-courses of length 30 (equal step-width in [2,2*PI]) in six classes. The time-courses were obtained by sampling from the respective class models. The normal distribution is denoted as .
Simulated Data 2
We selected eight HMMs encoding the possible three-segment regulation behaviors (e.g., down-down-down, up-down-down) and used a Monte-Carlo algorithm (variances of emission probabilities were set to 0.05) to generate 100 time-courses from each of the eight HMMs.
Yeast 5 Data
The Y5 data set was originally downloaded from Yeung Sup. Material. The five classes corresponds to genes listed in Cho et. al 1998 as belonging to the phases Early G1, Late G1, S, G2 and M.
Results
Simulated Data 2
Method | CR | Specificity | Sensitivity |
Caged | 0.077 | 0.160 | 0.834 | Splines | 0.166 | 0.308 | 0.255 |
k-means | 0.326 | 0.383 | 0.460 |
HMM Mix.& RMC | 0.500 | 0.518 | 0.631 |
HMM Clu.& RMC | 0.520 | 0.553 | 0.617 |
HMM Mix.& KMI | 0.563 | 0.595 | 0.648 |
HMM Clu.& KMI | 0.579 | 0.620 | 0.646 |
HMM Mix.& BMC | 0.586 | 0.634 | 0.641 |
HMM Clu.& BMC | 0.587 | 0.635 | 0.641 |
HMM Mix. 2.0% labeled | 0.589 | 0.704 | 0.593 |
Again, the partially supervised learning obtained the best result with the presence of only 2% of the labels (CR of 0.589). The estimation of HMMs with BMC initialization obtained also a good results (around 0.58), while K-means, splines and Caged hat very poor results (bellow 0.4). No big distinctions were noticed between mixture and clustering estimation. The main reason for this was the low standard deviation used in the generation of SIM2, which raised no robustness matters.
Go Enrichment
We list here the GO enrichment of clusters from Hela and YSPOR cited in the paper, which were not included in the manuscript. For the complete tables in all threshold see file at: http://algorithmics.molgen.mpg.de/ExpAna/go/ (see the documentation of GOStat for file descriptions).
.
GO number | Counts | -value | GO Term |
GO:0043169 | 7 of 3442 | 0.0387 | cation binding |
GO:0005509 | 4 of 914 | 0.0387 | calcium ion binding |
GO:0046872 | 7 of 3681 | 0.0403 | metal ion binding |
GO:0043167 | 7 of 3681 | 0.0403 | ion binding |
GO number | Counts | -value | GO Term |
No enrichment |
GO number | Counts | -value | GO Term |
GO:0007243 | 4 of 268 | 0.00276 | protein kinase cascade |
GO:0007242 | 5 of 1268 | 0.0332 | intracellular signaling cascade |
GO:0050896 | 6 of 2354 | 0.0332 | response to stimulus |
GO:0006468 | 4 of 923 | 0.0332 | protein amino acid phosphorylation |
GO:0006950 | 4 of 935 | 0.0332 | response to stress |
GO:0000185 | 1 of 3 | 0.0332 | activation of MAPKKK |
GO:0006915 | 3 of 456 | 0.0332 | apoptosis |
GO:0016773 | 4 of 1007 | 0.0332 | phosphotransferase activity, alcohol group as acceptor |
GO:0012501 | 3 of 459 | 0.0332 | programmed cell death |
GO:0007165 | 7 of 3681 | 0.0332 | signal transduction |
GO number | Counts | -value | GO Term |
GO:0000279 | 24 of 259 | 2.47e-10 | M phase |
GO:0000280 | 21 of 243 | 6.84e-08 | nuclear division |
GO:0007017 | 14 of 100 | 1.43e-05 | microtubule-based process |
GO:0000072 | 11 of 58 | 1.43e-05 | M phase specific microtubule process |
GO:0000070 | 8 of 33 | 8.4e-05 | mitotic sister chromatid segregation |
GO:0000819 | 8 of 33 | 8.4e-05 | sister chromatid segregation |
GO:0000226 | 12 of 88 | 8.96e-05 | microtubule cytoskeleton organization and biogenesis |
GO:0007067 | 14 of 133 | 0.000229 | mitosis |
GO:0000087 | 14 of 135 | 0.000244 | M phase of mitotic cell cycle |
GO:0000090 | 8 of 41 | 0.000296 | mitotic anaphase |
GO number | Counts | -value | GO Term |
GO:0000279 | 18 of 259 | 5.64e-05 | M phase |
GO:0000280 | 16 of 243 | 0.000333 | nuclear division |
GO:0000072 | 8 of 58 | 0.000632 | M phase specific microtubule process |
GO:0007067 | 11 of 133 | 0.00101 | mitosis |
GO:0000087 | 11 of 135 | 0.00101 | M phase of mitotic cell cycle |
GO:0007017 | 9 of 100 | 0.00267 | microtubule-based process |
GO:0007049 | 20 of 516 | 0.00269 | cell cycle |
GO:0000226 | 8 of 88 | 0.00538 | microtubule cytoskeleton organization and biogenesis |
GO:0008283 | 21 of 589 | 0.00739 | cell proliferation |
GO:0000070 | 5 of 33 | 0.00875 | mitotic sister chromatid segregation |
GO number | Counts | -value | GO Term |
No enrichment |
GO number | Counts | -value | GO Term |
GO:0008186 | 3 of 25 | 0.0689 | RNA-dependent ATPase activity |
GO:0004004 | 3 of 25 | 0.0689 | ATP-dependent RNA helicase activity |
GO:0005730 | 7 of 266 | 0.0907 | nucleolus |