Publication:
Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes

dc.contributor.authorEzkurdia, Iakes
dc.contributor.authorJuan, David
dc.contributor.authorManuel Rodriguez, Jose
dc.contributor.authorFrankish, Adam
dc.contributor.authorDiekhans, Mark
dc.contributor.authorHarrow, Jennifer
dc.contributor.authorVazquez, Jesus
dc.contributor.authorValencia, Alfonso
dc.contributor.authorTress, Michael L.
dc.contributor.funderNational Institutes of Health (Estados Unidos)
dc.contributor.funderMinisterio de Ciencia e Innovación (España)
dc.date.accessioned2017-12-01T07:37:29Z
dc.date.available2017-12-01T07:37:29Z
dc.date.issued2014
dc.description.abstractDetermining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide massspectrometry(MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60\% of the protein-coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96\% of genes that evolved before bilateria. At the opposite end of the scale, we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3\% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort.
dc.description.peerreviewed
dc.description.sponsorshipThis work was supported by the National Institutes of Health (NIH, grant number U41 HG007234) and by the Spanish Ministry of Science and Innovation (grant numbers BIO2007-666855, RD07-0067-0014, COMBIOMED). J.M.R. is supported by the Spanish National Institute of Bioinformatics (www.inab.org), a platform of the `Instituto de Salud Carlos III'. Funding to pay the Open Access publication charges for this article was provided by the National Institutes of Health (NIH, grant number U41 HG007234).
dc.format.page5866-5878
dc.format.volume23
dc.identifierISI:000344671900002
dc.identifier.citationHum Mol Genet. 2014; 23(22):5866-78
dc.identifier.doi10.1093/hmg/ddu309
dc.identifier.e-issn1460-2083
dc.identifier.issn0964-6906
dc.identifier.journalHuman Molecular Genetics
dc.identifier.pubmedID24939910
dc.identifier.urihttp://hdl.handle.net/20.500.12105/5536
dc.language.isoeng
dc.publisherOxford University Press
dc.relation.publisherversionhttps://doi.org/10.1093/hmg/ddu309
dc.repisalud.institucionCNIC
dc.repisalud.orgCNICCNIC::Grupos de investigación::Proteómica cardiovascular
dc.repisalud.orgCNICCNIC::Unidades técnicas::Proteómica / Metabolómica
dc.rights.accessRightsopen accesses_ES
dc.rights.licenseAtribución-NoComercial 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/*
dc.subjectHUMAN GENOME
dc.subjectEVOLUTIONARY INFORMATION
dc.subjectMASS-SPECTROMETRY
dc.subjectCELL-LINE
dc.subjectPROTEOMICS
dc.subjectDATABASE
dc.subjectANNOTATION
dc.subjectPREDICTION
dc.subjectPROJECT
dc.subjectSEQUENCES
dc.titleMultiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes
dc.typejournal article
dc.type.hasVersionVoR
dspace.entity.typePublication
relation.isAuthorOfPublicationbd96f60d-98c7-45d3-b247-22b4b53c78b6
relation.isAuthorOfPublication9743763b-919c-4fa9-a53c-57c41be5e0ac
relation.isAuthorOfPublicationd691c3d3-9e05-4217-a923-08e68ba16baa
relation.isAuthorOfPublication.latestForDiscoverybd96f60d-98c7-45d3-b247-22b4b53c78b6

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
MultipleEvidenceStrandsSuggest _2014.pdf
Size:
406.31 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
MultipleEvidenceStrandsSuggest _2014supp_data.docx
Size:
745.73 KB
Format:
Microsoft Word XML
Loading...
Thumbnail Image
Name:
MultipleEvidenceStrandsSuggest _2014supp_tables1.xlsx
Size:
3.43 MB
Format:
Microsoft Excel XML