Rodriguez, Jose ManuelAbascal, FedericoCerdán-Vélez, DanielGómez, Laura MartínezVázquez, JesúsTress, Michael L2024-12-042024-12-042024-08-12Nucleic Acids Res. 2024 Aug 12;52(14):8112-8126.https://hdl.handle.net/20.500.12105/25855National Human Genome Research Institute of the National Institutes of Health [U41 HG007234]; Spanish Ministry of Science, Innovation and Universities [PGC2018- 097019-B-I00]; Carlos III Institute of Health-Fondo de Investigación Sanitaria [IPT17/0019]; ‘la Caixa’ Foundation [HR17-00247]. Funding for open access charge: National Human Genome Research Institute of the National Institutes of Health [U41 HG007234].Ribosome profiling experiments support the translation of a range of novel human open reading frames. By contrast, most peptides from large-scale proteomics experiments derive from just one source, 5' untranslated regions. Across the human genome we find evidence for 192 translated upstream regions, most of which would produce protein isoforms with extended N-terminal ends. Almost all of these N-terminal extensions are from highly abundant genes, which suggests that the novel regions we detect are just the tip of the iceberg. These upstream regions have characteristics that are not typical of coding exons. Their GC-content is remarkably high, even higher than 5' regions in other genes, and a large majority have non-canonical start codons. Although some novel upstream regions have cross-species conservation - five have orthologues in invertebrates for example - the reading frames of two thirds are not conserved beyond simians. These non-conserved regions also have no evidence of purifying selection, which suggests that much of this translation is not functional. In addition, non-conserved upstream regions have significantly more peptides in cancer cell lines than would be expected, a strong indication that an aberrant or noisy translation initiation process may play an important role in translation from upstream regions.engVoRhttp://creativecommons.org/licenses/by/4.0/Evidence for widespread translation of 5' untranslated regions.Attribution 4.0 International3895316252148112-8126Nucleic Acids Researchopen access