Composite large margin classifiers with latent subclasses for heterogeneous biomedical data.

TitleComposite large margin classifiers with latent subclasses for heterogeneous biomedical data.
Publication TypeJournal Article
Year of Publication2016
AuthorsChen, Guanhua, Yufeng Liu, Dinggang Shen, and Michael R. Kosorok
JournalStat Anal Data Min
Volume9
Issue2
Pagination75-88
Date Published2016 Apr
ISSN1932-1864
Abstract

High dimensional classification problems are prevalent in a wide range of modern scientific applications. Despite a large number of candidate classification techniques available to use, practitioners often face a dilemma of choosing between linear and general nonlinear classifiers. Specifically, simple linear classifiers have good interpretability, but may have limitations in handling data with complex structures. In contrast, general nonlinear classifiers are more flexible, but may lose interpretability and have higher tendency for overfitting. In this paper, we consider data with potential latent subgroups in the classes of interest. We propose a new method, namely the Composite Large Margin Classifier (CLM), to address the issue of classification with latent subclasses. The CLM aims to find three linear functions simultaneously: one linear function to split the data into two parts, with each part being classified by a different linear classifier. Our method has comparable prediction accuracy to a general nonlinear classifier, and it maintains the interpretability of traditional linear classifiers. We demonstrate the competitive performance of the CLM through comparisons with several existing linear and nonlinear classifiers by Monte Carlo experiments. Analysis of the Alzheimer's disease classification problem using CLM not only provides a lower classification error in discriminating cases and controls, but also identifies subclasses in controls that are more likely to develop the disease in the future.

DOI10.1002/sam.11300
Alternate JournalStat Anal Data Min
Original PublicationComposite large margin classifiers with latent subclasses for heterogeneous biomedical data.
PubMed ID27326311
PubMed Central IDPMC4912001
Grant ListK01 AG030514 / AG / NIA NIH HHS / United States
P30 CA016086 / CA / NCI NIH HHS / United States
R01 CA149569 / CA / NCI NIH HHS / United States
P30 AG010129 / AG / NIA NIH HHS / United States
P01 CA142538 / CA / NCI NIH HHS / United States