Publication: Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes
| dc.contributor.author | Ezkurdia, Iakes | |
| dc.contributor.author | Juan, David | |
| dc.contributor.author | Manuel Rodriguez, Jose | |
| dc.contributor.author | Frankish, Adam | |
| dc.contributor.author | Diekhans, Mark | |
| dc.contributor.author | Harrow, Jennifer | |
| dc.contributor.author | Vazquez, Jesus | |
| dc.contributor.author | Valencia, Alfonso | |
| dc.contributor.author | Tress, Michael L. | |
| dc.contributor.funder | National Institutes of Health (Estados Unidos) | |
| dc.contributor.funder | Ministerio de Ciencia e Innovación (España) | |
| dc.date.accessioned | 2017-12-01T07:37:29Z | |
| dc.date.available | 2017-12-01T07:37:29Z | |
| dc.date.issued | 2014 | |
| dc.description.abstract | Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide massspectrometry(MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60\% of the protein-coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96\% of genes that evolved before bilateria. At the opposite end of the scale, we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3\% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort. | |
| dc.description.peerreviewed | Sí | |
| dc.description.sponsorship | This work was supported by the National Institutes of Health (NIH, grant number U41 HG007234) and by the Spanish Ministry of Science and Innovation (grant numbers BIO2007-666855, RD07-0067-0014, COMBIOMED). J.M.R. is supported by the Spanish National Institute of Bioinformatics (www.inab.org), a platform of the `Instituto de Salud Carlos III'. Funding to pay the Open Access publication charges for this article was provided by the National Institutes of Health (NIH, grant number U41 HG007234). | |
| dc.format.page | 5866-5878 | |
| dc.format.volume | 23 | |
| dc.identifier | ISI:000344671900002 | |
| dc.identifier.citation | Hum Mol Genet. 2014; 23(22):5866-78 | |
| dc.identifier.doi | 10.1093/hmg/ddu309 | |
| dc.identifier.e-issn | 1460-2083 | |
| dc.identifier.issn | 0964-6906 | |
| dc.identifier.journal | Human Molecular Genetics | |
| dc.identifier.pubmedID | 24939910 | |
| dc.identifier.uri | http://hdl.handle.net/20.500.12105/5536 | |
| dc.language.iso | eng | |
| dc.publisher | Oxford University Press | |
| dc.relation.publisherversion | https://doi.org/10.1093/hmg/ddu309 | |
| dc.repisalud.institucion | CNIC | |
| dc.repisalud.orgCNIC | CNIC::Grupos de investigación::Proteómica cardiovascular | |
| dc.repisalud.orgCNIC | CNIC::Unidades técnicas::Proteómica / Metabolómica | |
| dc.rights.accessRights | open access | es_ES |
| dc.rights.license | Atribución-NoComercial 4.0 Internacional | * |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc/4.0/ | * |
| dc.subject | HUMAN GENOME | |
| dc.subject | EVOLUTIONARY INFORMATION | |
| dc.subject | MASS-SPECTROMETRY | |
| dc.subject | CELL-LINE | |
| dc.subject | PROTEOMICS | |
| dc.subject | DATABASE | |
| dc.subject | ANNOTATION | |
| dc.subject | PREDICTION | |
| dc.subject | PROJECT | |
| dc.subject | SEQUENCES | |
| dc.title | Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes | |
| dc.type | journal article | |
| dc.type.hasVersion | VoR | |
| dspace.entity.type | Publication | |
| relation.isAuthorOfPublication | bd96f60d-98c7-45d3-b247-22b4b53c78b6 | |
| relation.isAuthorOfPublication | 9743763b-919c-4fa9-a53c-57c41be5e0ac | |
| relation.isAuthorOfPublication | d691c3d3-9e05-4217-a923-08e68ba16baa | |
| relation.isAuthorOfPublication.latestForDiscovery | bd96f60d-98c7-45d3-b247-22b4b53c78b6 |
Files
Original bundle
1 - 3 of 3
Loading...
- Name:
- MultipleEvidenceStrandsSuggest _2014.pdf
- Size:
- 406.31 KB
- Format:
- Adobe Portable Document Format
Loading...
- Name:
- MultipleEvidenceStrandsSuggest _2014supp_data.docx
- Size:
- 745.73 KB
- Format:
- Microsoft Word XML
Loading...
- Name:
- MultipleEvidenceStrandsSuggest _2014supp_tables1.xlsx
- Size:
- 3.43 MB
- Format:
- Microsoft Excel XML


