On external indices for mixtures: validating mixtures of genes

I.G. Costa and A. Schliep

In From Data and Information Analysis to Knowledge Engineering, Springer 2005, 662–669, 2005.

Mixture models represent results of gene expression cluster analysis in a more natural way than ?hard? partitions. This is also true for the representation of gene labels, such as functional annotations, where one gene is often assigned to more than one annotation term. Another important characteristic of functional annotations is their higher degree of detail in relation to groups of co-expressed genes. In other words, genes with similar function should be be grouped together, but the inverse does not holds. Both these facts, however, have been neglected by validation studies in the context of gene expression analysis presented so far. To overcome the first problem, we propose an external index extending the corrected Rand for comparison of two mixtures. To address the second and more challenging problem, we perform a clustering of terms from the functional annotation, in order to address the problem of difference in coarseness of two mixtures to be compared. We resort to simulated and biological data to show the usefulness of our proposals. The results show that we can only differentiate between distinct solutions after applying the component clustering.

A reprint is available as PDF.

Further publications by Alexander Schliep, Ivan G Costa.