Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data

Software

The implementation of the semi-supervised clustering with pairwise constraints can be found at Pymix. The method is being integrated in the graphical tool GQL.

Data

The final data set of gene expression profiles after filtering of genes and transformation of data can be found here. In this tab separated file, each line represents a gene, where the first column represents the gene name, second the probe id, and subsequence columns the expression values for the 11 time points.

The positive and negative constraints derived from the in-situ images can be found at: positives shared at 3 stages, positives shared at 4 stages, negatives shared at 3 stages and negatives shared at 4 stages. In this tab separated file, each line represents an gene, and genes are in the same order as the gene expression file. An entry in column jth in the ith line indicates the constraint value of the ith gene with the (j-1)th gene.

Results

Follow the links below to access the results reports of the Constrained and Unconstrained Clustering discussed in the manuscript. The report lists for each individual cluster: gene list, a time series plot, in situ images and enrichment analysis for Gene Ontology and ImaGO terms.

We also make available a script friendly version of the results with flybase identifiers (or affy probe ids if no flybase id is found) for Unconstrained Clustering and Constrained Clustering. This file has same format as gene expression data data file, second collumn indicates the cluster number.