CNVstat: Statistical association analysis of copy number variants (C).

TitleCNVstat: Statistical association analysis of copy number variants (C).
Publication TypeSoftware
Year of Publication2012
AuthorsHu, Yi-Juan, Dan-Yu Lin, Wei Sun, and Donglin Zeng

Copy number variants (CNVs) and single nucleotide polymorphisms (SNPs) co-exist throughout the human genome and jointly contribute to phenotypic variations. Thus, it is desirable to consider both types of variants, as characterized by allele-specific copy numbers (ASCNs), in association studies of complex human diseases. Current SNP genotyping technologies capture the CNV and SNP information simultaneously via fuorescent intensity measurements. The common practice of calling ASCNs from the intensity measurements and then using the ASCN calls in downstream association analysis has important limitations. First, the association tests are prone to false-positive findings when differential measurement errors between cases and controls arise from differences in DNA quality or handling. Second, the uncertainties in the ASCN calls are ignored.

CNVstat is a command-line program written in C for the statistical association analysis of CNVs and SNPs. CNVstat allows the user to estimate or test the effects of CNVs and SNPs by maximizing the (observed-data) likelihood that properly accounts for differential measurement errors and calling uncertainties. It is versatile in several aspects: (1) it provides the integrated analysis of CNVs and SNPs as well as the analysis of total CNVs; (2) it can accommodate both Affymetrix and Illumina data, as well as all platforms that assay CNVs quantitatively, such as array CGH; (3) it accounts for the case-control sampling, differential measurement errors and calling uncertainties; (4) it can be readily extended to other study designs and traits; (5) it formulates the effects of CNVs and SNPs on the phenotype through flexible regression models, which can accommodate various genetic mechanisms and gene-environment interactions; and (6) it allows genetic and environmental variables to be correlated. The program is fast and scalable to genomewide association scans. For example, it took about 2 hrs on a 64-bit, 3.0-GHz Intel Xeon machine to perform the analysis on chromosome 1 of the schizophrenia data (Hu et al. Submitted for publication). We are working intensely to improve the capabilities of CNVstat, so please check back frequently for updates.

Original PublicationCNVstat: Statistical association analysis of copy number variants (C).