Mixture model based group inference in fused genotype and phenotype data

B. Georgi, M.A. Spence, P. Flodman and A. Schliep

In Studies in Classification, Data Analysis, and Knowledge Organization, Springer, 2007.

The analysis of genetic diseases has classically been directed towards establishing direct links between cause, a genetic variation, and effect, the observable deviation of phenotype. For complex diseases which are caused by multiple factors and which show a wide spread of variations in the phenotypes this is unlikely to succeed. One example is the Attention Deficit Hyperactivity Disorder (ADHD), where it is expected that phenotypic variations will be caused by the overlapping effects of several distinct genetic mechanisms. The classical statistical models to cope with overlapping subgroups are mixture models, essentially convex combinations of density functions, which allow inference of descriptive models from data as well as the deduction of groups. An extension of conventional mixtures with attractive properties for clustering is the context-specific independence (CSI) framework. CSI allows for an automatic adaption of model complexity to avoid overfitting and yields a highly descriptive model.

Supplementary information is available at https://schlieplab.org/Static/Supplements/ADHD/geno_7_clust.PNG. A reprint is available as PDF.

The publication includes results from the following projects or software tools: PyMix, ComplexDiseases, CSIMixtures.

Further publications by Alexander Schliep, Benjamin Georgi.