Publication:
An audit of the PeptideAtlas database uncovers evidence for repurposed pseudogenes and co-opted retroviral ORFs.

dc.contributor.authorRodriguez, Jose Manuel
dc.contributor.authorMaquedano, Miguel
dc.contributor.authorCerdán-Vélez, Daniel
dc.contributor.authorLaguillo-Gómez, Andrea
dc.contributor.authorCalvo, Enrique
dc.contributor.authorAbascal, Federico
dc.contributor.authorVázquez, Jesús
dc.contributor.authorTress, Michael L
dc.contributor.funderMinisterio de Ciencia e Innovación (España)
dc.contributor.funderUnión Europea. Comisión Europea. NextGenerationEU
dc.contributor.funderComunidad de Madrid (España)
dc.contributor.funderFundación La Caixa
dc.contributor.funderMinisterio de Ciencia e Innovación. Centro de Excelencia Severo Ochoa (España)
dc.date.accessioned2025-12-15T13:56:58Z
dc.date.available2025-12-15T13:56:58Z
dc.date.issued2025-11-21
dc.description.abstractThe human genome has been the subject of scrutiny for more than two decades, yet new protein coding genes are still being uncovered. Recently ribosome profiling experiments have provided evidence for the translation of thousands of novel open reading frames (ORFs). To determine how many of these novel ORFs have peptide support, we carried out an in-depth investigation of an entire mass spectrometry proteomics database. We analysed the peptides housed in the human build of the PeptideAtlas database and identified reliable evidence for 35 potential coding genes not annotated in the Ensembl/GENCODE reference gene set. Evidence from complementary sources confirmed that 16 were almost certainly coding genes, but we believe that at least 14 are most likely to be undergoing aberrant translation. These 14 genes had reading frames that were not preserved beyond human and their peptides were restricted to cancers or cell lines. Remarkably, three of the sixteen likely coding genes were derived from endogenous retroviral ORFs and were expressed only in placenta. All three had evidence of purifying selection. Retroviral ORFs (syncytins) with distinct origins are expressed in almost all mammalian placentae and these results suggest that co-opted ORFs may also play an important role in placental development. Our analysis shows that proteomics data can be used in conjunction with evolutionary evidence to confirm the existence of new coding genes. The evidence suggests that both testis and placenta are the tissues most likely to express still to be identified coding genes, and that there may be other transposon-derived ORF that have been co-opted as coding genes. The strong evidence for the translation of regions under dysregulated conditions has important implications for the annotation of coding genes and in the analysis of cancer and other degenerative diseases.The online version contains supplementary material available at 10.1186/s12864-025-12238-w.
dc.description.peerreviewed
dc.description.tableofcontentsThis study was supported by competitive grants PID2021-122348NB-I00 and PID2024-155650NB-I00 funded by MICIU/AEI/ 10.13039/501100011033 and by “ERDF/EU”, PLEC2022-009298, PLEC2022-009235 and EQC2021-007053-P funded by MICIU/AEI/10.13039/501100011033 and by “European Union NextGenerationEU/ PRTR”, andS2022/BMD-7333-CM (INMUNOVAR-CM) funded by Comunidad de Madrid. The project leading to these results has received funding from ”la Caixa” Foundation under the project code LCF/PR/HR22/52420019. The CNIC is supported by the Instituto de Salud Carlos III (ISCIII), the Ministerio de Ciencia, Innovación Y Universidades (MICIU) and the Pro CNIC Foundation), and is a Severo Ochoa Center of Excellence (grant CEX2020-001041-S funded by MICIU/AEI/10.13039/501100011033).
dc.identifier.citationBMC Genomics. 2025 Nov 21;26(1):1087.
dc.identifier.journalBMC GENOMICS
dc.identifier.pubmedID41272456
dc.identifier.urihttps://hdl.handle.net/20.500.12105/27024
dc.language.isoeng
dc.publisherBMC
dc.relation.isreferencedbyPubMed
dc.relation.projectIDinfo:eu-repo/grantAgreement/ES/PID2021-122348NB-I00
dc.relation.projectIDinfo:eu-repo/grantAgreement/ES/PID2024-155650NB-I00
dc.relation.projectIDinfo:eu-repo/grantAgreement/ES/MICIU/AEI/10.13039/501100011033
dc.relation.projectIDinfo:eu-repo/grantAgreement/ES/PLEC2022-009298
dc.relation.projectIDinfo:eu-repo/grantAgreement/ES/PLEC2022-009235
dc.relation.projectIDinfo:eu-repo/grantAgreement/ES/QC2021-007053-P
dc.relation.projectIDinfo:eu-repo/grantAgreement/ES/S2022/BMD-7333-CM
dc.relation.projectIDinfo:eu-repo/grantAgreement/ES/LCF/PR/HR22/52420019
dc.relation.projectIDinfo:eu-repo/grantAgreement/ES/CEX2020-001041-S
dc.relation.publisherversionhttps://doi.org/10.1186/s12864-025-12238-w
dc.repisalud.institucionCNIC
dc.repisalud.orgCNICCNIC::Grupos de investigación::Proteómica cardiovascular
dc.rights.accessRightsopen access
dc.rights.licenseAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectCo-option
dc.subjectCoding genes
dc.subjectEndogenous retrovirus
dc.subjectProteomics
dc.subjectPseudogenes
dc.titleAn audit of the PeptideAtlas database uncovers evidence for repurposed pseudogenes and co-opted retroviral ORFs.
dc.typeresearch article
dc.type.hasVersionVoR
dspace.entity.typePublication

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
An audit of the PeptideAtlas database_BMC Genomics_2025.pdf
Size:
4.35 MB
Format:
Adobe Portable Document Format