Schliep lab : page.title

Analyzing gene expression time-courses

Alexander Schliep, Ivan G. Costa, Christine Steinhoff, Alexander Schönhuth

Software

All experiments with HMMs estimations were run with GQL. The GQL-software, licensed under the GPL, is available at http://ghmm.org/gql. It requires a couple of other packages, most importantly Python and the GHMM. Splines and CAGED implementations were obtained from the authors, and a implementation of k-means in python was obtained at Open Source clustering . The python scripts used for generating all experiments are also available for download (see readme.txt for file descriptions).

Data

All data sets used in the experiments can be downloaded from (see http://ghmm.org/gql for format description): https://schlieplab.org/Static/Supplements/ExpAna/data/

Simulated Data 1

The simulated data set SIM 1 consists of a total of 3500 time-courses of length 30 (equal step-width in [2,2*PI]) in six classes. The time-courses were obtained by sampling from the respective class models. The normal distribution is denoted as $N(\mu,\sigma )$ .

Class	Description	Size	Function
C1	up-regulation	500	$0.15 \cdot x - 0.7 + N(1,0.3)$
C2	noise	1000
C3	down-regulation	500	$-0.3 \cdot x - 0.3 + N(1,0.3)$
C4	cyclic 1	500	$N(1,0.1)\cdot \sin\left(1.2\cdot N(1,0.05)\cdot x +0.8 \cdot 2\pi\right) + N(0,0.4)$
C5	cyclic 1	100	$N(1,0.0075)\cdot \sin\left(1.2 \cdot N(1,0.1)\cdot x +0.6 \cdot 2\pi\right) + N(0,0.5)$
C6	cyclic 1	900	$N(1,0.9)\cdot \sin\left(1.5 \cdot N(1,0.025)\cdot x +0.5\cdot 2\pi\right) + N(0,0.5)$

Simulated Data 2

We selected eight HMMs encoding the possible three-segment regulation behaviors (e.g., down-down-down, up-down-down) and used a Monte-Carlo algorithm (variances of emission probabilities were set to 0.05) to generate 100 time-courses from each of the eight HMMs.

Yeast 5 Data

The Y5 data set was originally downloaded from Yeung Sup. Material. The five classes corresponds to genes listed in Cho et. al 1998 as belonging to the phases Early G1, Late G1, S, G2 and M.

Results

Simulated Data 2

Results on SIM2.
Method	CR	Specificity	Sensitivity
Caged	0.077	0.160	0.834
Splines	0.166	0.308	0.255
k-means	0.326	0.383	0.460
HMM Mix.& `RMC`	0.500	0.518	0.631
HMM Clu.& `RMC`	0.520	0.553	0.617
HMM Mix.& `KMI`	0.563	0.595	0.648
HMM Clu.& `KMI`	0.579	0.620	0.646
HMM Mix.& `BMC`	0.586	0.634	0.641
HMM Clu.& `BMC`	0.587	0.635	0.641
HMM Mix. 2.0% labeled	0.589	0.704	0.593

Again, the partially supervised learning obtained the best result with the presence of only 2% of the labels (CR of 0.589). The estimation of HMMs with BMC initialization obtained also a good results (around 0.58), while K-means, splines and Caged hat very poor results (bellow 0.4). No big distinctions were noticed between mixture and clustering estimation. The main reason for this was the low standard deviation used in the generation of SIM2, which raised no robustness matters.

Go Enrichment

We list here the GO enrichment of clusters from Hela and YSPOR cited in the paper, which were not included in the manuscript. For the complete tables in all threshold see file at: http://algorithmics.molgen.mpg.de/ExpAna/go/ (see the documentation of GOStat for file descriptions).

GO Enrichment for clusters three from `HeLa` for threshold 1.69
GO number	Counts	-value	GO Term
No enrichment

GO Enrichment for clusters three from `HeLa` for threshold 0.31
GO number	Counts	-value	GO Term
GO:0043169	7 of 3442	0.0387	cation binding
GO:0005509	4 of 914	0.0387	calcium ion binding
GO:0046872	7 of 3681	0.0403	metal ion binding
GO:0043167	7 of 3681	0.0403	ion binding

GO Enrichment for clusters six from `HeLa` for threshold 1.69
GO number	Counts	-value	GO Term
No enrichment

GO Enrichment for clusters six from `HeLa` for threshold 0.31
GO number	Counts	-value	GO Term
GO:0007243	4 of 268	0.00276	protein kinase cascade
GO:0007242	5 of 1268	0.0332	intracellular signaling cascade
GO:0050896	6 of 2354	0.0332	response to stimulus
GO:0006468	4 of 923	0.0332	protein amino acid phosphorylation
GO:0006950	4 of 935	0.0332	response to stress
GO:0000185	1 of 3	0.0332	activation of MAPKKK
GO:0006915	3 of 456	0.0332	apoptosis
GO:0016773	4 of 1007	0.0332	phosphotransferase activity, alcohol group as acceptor
GO:0012501	3 of 459	0.0332	programmed cell death
GO:0007165	7 of 3681	0.0332	signal transduction

GO Enrichment for clusters one from `YSPR` for threshold 1.3
GO number	Counts	-value	GO Term
GO:0000279	24 of 259	2.47e-10	M phase
GO:0000280	21 of 243	6.84e-08	nuclear division
GO:0007017	14 of 100	1.43e-05	microtubule-based process
GO:0000072	11 of 58	1.43e-05	M phase specific microtubule process
GO:0000070	8 of 33	8.4e-05	mitotic sister chromatid segregation
GO:0000819	8 of 33	8.4e-05	sister chromatid segregation
GO:0000226	12 of 88	8.96e-05	microtubule cytoskeleton organization and biogenesis
GO:0007067	14 of 133	0.000229	mitosis
GO:0000087	14 of 135	0.000244	M phase of mitotic cell cycle
GO:0000090	8 of 41	0.000296	mitotic anaphase

GO Enrichment for clusters one from `YSPR` for threshold 0.41
GO number	Counts	-value	GO Term
GO:0000279	18 of 259	5.64e-05	M phase
GO:0000280	16 of 243	0.000333	nuclear division
GO:0000072	8 of 58	0.000632	M phase specific microtubule process
GO:0007067	11 of 133	0.00101	mitosis
GO:0000087	11 of 135	0.00101	M phase of mitotic cell cycle
GO:0007017	9 of 100	0.00267	microtubule-based process
GO:0007049	20 of 516	0.00269	cell cycle
GO:0000226	8 of 88	0.00538	microtubule cytoskeleton organization and biogenesis
GO:0008283	21 of 589	0.00739	cell proliferation
GO:0000070	5 of 33	0.00875	mitotic sister chromatid segregation

GO Enrichment for clustersfour from `YSPR` for threshold 1.3
GO number	Counts	-value	GO Term
No enrichment

GO Enrichment for clustersfour from `YSPR` for threshold 0.41
GO number	Counts	-value	GO Term
GO:0008186	3 of 25	0.0689	RNA-dependent ATPase activity
GO:0004004	3 of 25	0.0689	ATP-dependent RNA helicase activity
GO:0005730	7 of 266	0.0907	nucleolus