SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification

Tardaguila, Manuel; de la Fuente, Lorena; Marti, Cristina; Pereira, Cecile; Pardo-Palacios, Francisco Jose; del Risco, Hector; Ferrell, Marc; Mellado, Maravillas; Macchietto, Marissa; Verheggen, Kenneth; Edelmann, Mariola; Ezkurdia, Iakes; Vazquez, Jesus; Tress, Michael; Mortazavi, Ali; Martens, Lennart; Rodriguez-Navarro, Susana; Moreno-Manzano, Victoria; Conesa, Ana

doi:10.1101/gr.222976.117

Inicio

Sobre Repisalud

Info autores

FAQs

Contacto/Sugerencias

español
- español
- English

Mostrar el registro sencillo del ítem

dc.contributor.author	Tardaguila, Manuel
dc.contributor.author	de la Fuente, Lorena
dc.contributor.author	Marti, Cristina
dc.contributor.author	Pereira, Cecile
dc.contributor.author	Pardo-Palacios, Francisco Jose
dc.contributor.author	del Risco, Hector
dc.contributor.author	Ferrell, Marc
dc.contributor.author	Mellado, Maravillas
dc.contributor.author	Macchietto, Marissa
dc.contributor.author	Verheggen, Kenneth
dc.contributor.author	Edelmann, Mariola
dc.contributor.author	Ezkurdia, Iakes
dc.contributor.author	Vazquez, Jesus
dc.contributor.author	Tress, Michael
dc.contributor.author	Mortazavi, Ali
dc.contributor.author	Martens, Lennart
dc.contributor.author	Rodriguez-Navarro, Susana
dc.contributor.author	Moreno-Manzano, Victoria
dc.contributor.author	Conesa, Ana
dc.date.accessioned	2018-11-22T08:10:53Z
dc.date.available	2018-11-22T08:10:53Z
dc.date.issued	2018
dc.identifier	ISI:000426355600012
dc.identifier.citation	Genome Res. 2018; 28(3):396-441
dc.identifier.issn	1088-9051
dc.identifier.uri	http://hdl.handle.net/20.500.12105/6686
dc.description.abstract	High-throughput sequencing of full-length transcripts using long reads has paved the way for the discovery of thousands of novel transcripts, even in well-annotated mammalian species. The advances in sequencing technology have created a need for studies and tools that can characterize these novel variants. Here, we present SQANTI, an automated pipeline for the classification of long-read transcripts that can assess the quality of data and the preprocessing pipeline using 47 unique descriptors. We apply SQANTI to a neuronal mouse transcriptome using Pacific Biosciences (PacBio) long reads and illustrate how the tool is effective in characterizing and describing the composition of the full-length transcriptome. We perform extensive evaluation of ToFU PacBio transcripts by PCR to reveal that an important number of the novel transcripts are technical artifacts of the sequencing approach and that SQANTI quality descriptors can be used to engineer a filtering strategy to remove them. Most novel transcripts in this curated transcriptome are novel combinations of existing splice sites, resulting more frequently in novel ORFs than novel UTRs, and are enriched in both general metabolic and neural-specific functions. We show that these new transcripts have a major impact in the correct quantification of transcript levels by state-of-the-art short-read-based quantification algorithms. By comparing our iso-transcriptome with public proteomics databases, we find that alternative isoforms are elusive to proteogenomics detection. SQANTI allows the user to maximize the analytical outcome of long-read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes.
dc.description.sponsorship	We thank Eric Triplett (University of Florida) for support in sequencing experiments and Elizabeth Tseng (PacBio) for helping in running the ToFU pipeline and critically reading this manuscript. This work has been partially funded by the University of Florida Preeminence hires program, the Spanish Ministry of Economy and Competitiveness grants BIO2015-71658-R, BFU2014-57636-P, Spanish Ministry of Education grant FPU2013/02348, and GENCODE NIH grant 2U41 HG007234.
dc.language.iso	eng
dc.publisher	Cold Spring Harbor Laboratory Press
dc.type.hasVersion	VoR
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	*
dc.title	SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification
dc.type	journal article
dc.rights.license	Atribución 4.0 Internacional	*
dc.identifier.pubmedID	29440222
dc.format.volume	28
dc.format.page	396-411
dc.identifier.doi	10.1101/gr.222976.117
dc.contributor.funder	National Institutes of Health (Estados Unidos)
dc.contributor.funder	University of Florida (Estados Unidos)
dc.contributor.funder	Ministerio de Economía y Competitividad (España)
dc.contributor.funder	Ministerio de Educación (España)
dc.description.peerreviewed	Sí
dc.identifier.e-issn	1549-5469
dc.relation.publisherversion	https://doi.org/10.1101/gr.222976.117
dc.identifier.journal	Genome Research
dc.repisalud.orgCNIC	CNIC::Grupos de investigación::Proteómica cardiovascular
dc.repisalud.orgCNIC	CNIC::Unidades técnicas::Proteómica / Metabolómica
dc.repisalud.institucion	CNIC
dc.relation.projectID	info:eu-repo/grantAgreement/ES/BIO2015-71658-R	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/ES/BFU2014-57636-P	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/ES/FPU2013/02348	es_ES
dc.rights.accessRights	open access	es_ES