MASCAAT: Meta-Learning for Selection and Combination of Clustering Algorithms Applied to Gene Expression Analysis

Whether to cluster at all, which clustering method to use and how many clusters to choose are pressing questions in bioinformatics. Mostly, decisions are made by users of clustering software based on experience guided by benchmarking or indicators for reliability of solutions or model-fit. However, as clustering algorithms always produce solutions, often inappropriate methods or parameters are used and invalid results produced.

In the previous context, meta-learning approaches have arisen as effective solutions, able to automatically predicting algorithms performance for a given problem. Thus, such approaches could support non-expert users in the algorithm selection task. As pointed out in, there are different interpretations for the term meta-Learning In our work, we use meta-learning meaning the automatic process of generating knowledge that relates the performance of machine learning algorithms to the characteristics of the problem (i.e., characteristics of its datasets).

So far, in the literature, meta-learning has been used only for selecting/ranking supervised learning algorithms. That is, up to now, there no such an approach for the context of clustering algorithms (i.e., unsupervised learning). Motivated by this, we extend the use of meta-learning approaches for clustering algorithms. We develop our case study in the context of clustering algorithms applied to cancer gene expression data generated by microarray.

More information at the Project Page. Joint work funded funded by CAPES (Brazil) and DAAD (Germany) under the program Probral.

For further information contact Ivan G Costa (


Members: Ivan G Costa, Alexander Schliep, Benjamin Georgi, Marcilio C. Pereira de Souto. Collaborators: Teresa Bernarda Ludermir (Centro de Informática da Universidade Federal de Pernambuco), Francisco de Assis Tenório de Carvalho (Centro de Informática Universidade Federal de Pernambuco).


de Souto et al.. Comparative Study on Normalization Procedures for Cluster Analysis of Gene Expression Datasets. In Proceedings of the International Joint Conference on Neural Networks, IEEE Computer Society, 2008.

de Souto et al.. Ranking and Selecting Clustering Algorithms Using a Meta-learning Approach. In Proceedings of the International Joint Conference on Neural Networks, IEEE Computer Society, 2008.

de Souto et al.. Clustering cancer gene expression data: a comparative study. BMC Bioinformatics 2008, 9:1, 497.

Costa et al.. Validating Gene Clusterings by Selecting Informative Gene Ontology Terms with Mutual Information. In Advances in Bioinformatics and Computational Biology, Proceedings of the Brazilian Symposium on Bioinformatics, Springer Verlag, 81–92, 2007.