Publication:
Loose ends: almost one in five human genes still have unresolved coding status

dc.contributor.authorAbascal, Federico
dc.contributor.authorJuan, David
dc.contributor.authorJungreis, Irwin
dc.contributor.authorMartinez, Laura
dc.contributor.authorRigau, Maria
dc.contributor.authorRodriguez, Jose Manuel
dc.contributor.authorVazquez, Jesus
dc.contributor.authorTress, Michael L.
dc.contributor.funderNational Institutes of Health (Estados Unidos)
dc.date.accessioned2018-10-26T07:59:26Z
dc.date.available2018-10-26T07:59:26Z
dc.date.issued2018
dc.description.abstractSeventeen years after the sequencing of the human genome, the human proteome is still under revision. One in eight of the 22 210 coding genes listed by the Ensembl/GENCODE, RefSeq and UniProtKB reference databases are annotated differently across the three sets. We have carried out an in-depth investigation on the 2764 genes classified as coding by one or more sets of manual curators and not coding by others. Data from large-scale genetic variation analyses suggests that most are not under protein-like purifying selection and so are unlikely to code for functional proteins. A further 1470 genes annotated as coding in all three reference sets have characteristics that are typical of non-coding genes or pseudogenes. These potential non-coding genes also appear to be undergoing neutral evolution and have considerably less supporting transcript and protein evidence than other coding genes. We believe that the three reference databases currently overestimate the number of human coding genes by at least 2000, complicating and adding noise to large-scale biomedical experiments. Determining which potential non-coding genes do not code for proteins is a difficult but vitally important task since the human reference proteome is a fundamental pillar of most basic research and supports almost all large-scale biomedical projects.
dc.description.peerreviewed
dc.description.sponsorshipNational Institutes of Health [2 U41 HG007234 to I.J., L.M., J.M.R. and M.L.T., R01 HG004037 to I.J.]. Funding for open access charge: NIH [2 U41 HG007234].
dc.format.page7070-7084
dc.format.volume46
dc.identifierISI:000444131400017
dc.identifier.citationNucleic Acids Res. 2018; 46(14):7070-7084
dc.identifier.doi10.1093/nar/gky587
dc.identifier.e-issn1362-4962
dc.identifier.issn0305-1048
dc.identifier.journalNucleic Acids Research
dc.identifier.pubmedID29982784
dc.identifier.urihttp://hdl.handle.net/20.500.12105/6538
dc.language.isoeng
dc.publisherOxford University Press
dc.relation.publisherversionhttps://doi.org/10.1093/nar/gky587
dc.repisalud.institucionCNIC
dc.repisalud.orgCNICCNIC::Grupos de investigación::Proteómica cardiovascular
dc.rights.accessRightsopen accesses_ES
dc.rights.licenseAtribución 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subjectHUMAN GENOME
dc.subjectEVOLUTIONARY INFORMATION
dc.subjectFUNCTIONALLY IMPORTANT
dc.subjectINTEGRATED MAP
dc.subjectPROTEOME
dc.subjectPREDICTION
dc.subjectTOPOLOGY
dc.subjectDATABASE
dc.subjectPROJECT
dc.subjectNUMBER
dc.titleLoose ends: almost one in five human genes still have unresolved coding status
dc.typejournal article
dc.type.hasVersionVoR
dspace.entity.typePublication
relation.isAuthorOfPublication63e55d34-c1c9-439c-bc46-f5b9830e538a
relation.isAuthorOfPublication9743763b-919c-4fa9-a53c-57c41be5e0ac
relation.isAuthorOfPublication.latestForDiscovery63e55d34-c1c9-439c-bc46-f5b9830e538a

Files

Original bundle

Now showing 1 - 4 of 4
Loading...
Thumbnail Image
Name:
LooseEndsAlmostOne_2018.pdf
Size:
6.27 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
LooseEndsAlmostOne_2018_AdditionalFile2.xlsx
Size:
5.15 MB
Format:
Microsoft Excel XML
Loading...
Thumbnail Image
Name:
LooseEndsAlmostOne_2018_suppl.pdf
Size:
2.97 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
LooseEndsAlmostOne_2018_corrigendum.pdf
Size:
63.14 KB
Format:
Adobe Portable Document Format