Tximeta: Reference sequence checksums for provenance identification in RNA-seq.

TitleTximeta: Reference sequence checksums for provenance identification in RNA-seq.
Publication TypeJournal Article
Year of Publication2020
AuthorsLove, Michael I., Charlotte Soneson, Peter F. Hickey, Lisa K. Johnson, Tessa N Pierce, Lori Shepherd, Martin Morgan, and Rob Patro
JournalPLoS Comput Biol
Date Published2020 Feb
KeywordsAlgorithms, Animals, Computational Biology, Drosophila melanogaster, Gene Expression Profiling, Genomics, Humans, Mice, Models, Statistical, Pattern Recognition, Automated, Programming Languages, Reproducibility of Results, RNA-Seq, Software, Transcriptome

Correct annotation metadata is critical for reproducible and accurate RNA-seq analysis. When files are shared publicly or among collaborators with incorrect or missing annotation metadata, it becomes difficult or impossible to reproduce bioinformatic analyses from raw data. It also makes it more difficult to locate the transcriptomic features, such as transcripts or genes, in their proper genomic context, which is necessary for overlapping expression data with other datasets. We provide a solution in the form of an R/Bioconductor package tximeta that performs numerous annotation and metadata gathering tasks automatically on behalf of users during the import of transcript quantification files. The correct reference transcriptome is identified via a hashed checksum stored in the quantification output, and key transcript databases are downloaded and cached locally. The computational paradigm of automatically adding annotation metadata based on reference sequence checksums can greatly facilitate genomic workflows, by helping to reduce overhead during bioinformatic analyses, preventing costly bioinformatic mistakes, and promoting computational reproducibility. The tximeta package is available at https://bioconductor.org/packages/tximeta.

Alternate JournalPLoS Comput Biol
Original PublicationTximeta: Reference sequence checksums for provenance identification in RNA-seq.
PubMed ID32097405
PubMed Central IDPMC7059966
Grant ListR01 MH118349 / MH / NIMH NIH HHS / United States
R01 HG009937 / HG / NHGRI NIH HHS / United States
P01 CA142538 / CA / NCI NIH HHS / United States
U41 HG004059 / HG / NHGRI NIH HHS / United States
P30 ES010126 / ES / NIEHS NIH HHS / United States