Publication: Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability
| dc.contributor.author | Guerrero-Tamayo, Ana | |
| dc.contributor.author | Sanz Urquijo, Borja | |
| dc.contributor.author | Olivares, Isabel | |
| dc.contributor.author | Moragues Tosantos, María-Dolores | |
| dc.contributor.author | Casado, Concepcion | |
| dc.contributor.author | Pastor-López, Iker | |
| dc.contributor.funder | Universidad de Deusto (España) | |
| dc.date.accessioned | 2025-01-15T10:25:52Z | |
| dc.date.available | 2025-01-15T10:25:52Z | |
| dc.date.issued | 2024 | |
| dc.description.abstract | The global impact of the SARS-CoV-2 pandemic has underscored the need for a deeper understanding of viral evolution to anticipate new viruses or variants. Genetic recombination is a fundamental mechanism in viral evolution, yet it remains poorly understood. In this study, we conducted a comprehensive research on the genetic regions associated with genetic recombination features in SARS-CoV-2. With this aim, we implemented a two-phase transfer learning approach using genomic spectrograms of complete SARS-CoV-2 sequences. In the first phase, we utilized a pre-trained VGG-16 model with genomic spectrograms of HIV-1, and in the second phase, we applied HIV-1 VGG-16 model to SARS-CoV-2 spectrograms. The identification of key recombination hot zones was achieved using the Grad-CAM interpretability tool, and the results were analyzed by mathematical and image processing techniques. Our findings unequivocally identify the SARS-CoV-2 Spike protein (S protein) as the pivotal region in the genetic recombination feature. For non-recombinant sequences, the relevant frequencies clustered around 1/6 and 1/12. In recombinant sequences, the sharp prominence of the main hot zone in the Spike protein prominently indicated a frequency of 1/6. These findings suggest that in the arithmetic series, every 6 nucleotides (two triplets) in S may encode crucial information, potentially concealing essential details about viral characteristics, in this case, recombinant feature of a SARS-CoV-2 genetic sequence. This insight further underscores the potential presence of multifaceted information within the genome, including mathematical signatures that define an organism's unique attributes. | |
| dc.description.peerreviewed | Sí | |
| dc.description.sponsorship | This work was supported by the Research Training Grants Program - University of Deusto: Ref. FPI UD_2021_10. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. | |
| dc.format.number | 8 | |
| dc.format.page | e0309391 | |
| dc.format.volume | 19 | |
| dc.identifier.citation | Guerrero-Tamayo A, Sanz Urquijo B, Olivares I, Moragues Tosantos MD, Casado C, Pastor-López I. Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability. PLoS One. 2024 Aug 26;19(8):e0309391. | |
| dc.identifier.doi | 10.1371/journal.pone.0309391 | |
| dc.identifier.e-issn | 1932-6203 | |
| dc.identifier.journal | PloS one | |
| dc.identifier.pubmedID | 39186542 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.12105/26022 | |
| dc.language.iso | eng | |
| dc.publisher | Public Library of Science (PLOS) | |
| dc.relation.publisherversion | https://doi.org/10.1371/journal.pone.0309391 | |
| dc.repisalud.centro | ISCIII::Centro Nacional de Microbiología (CNM) | |
| dc.repisalud.institucion | ISCIII | |
| dc.rights.accessRights | open access | |
| dc.rights.license | Attribution 4.0 International | |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | |
| dc.subject.mesh | COVID-19 | |
| dc.subject.mesh | Genome, Viral | |
| dc.subject.mesh | HIV-1 | |
| dc.subject.mesh | Humans | |
| dc.subject.mesh | Neural Networks, Computer | |
| dc.subject.mesh | Recombination, Genetic | |
| dc.subject.mesh | SARS-CoV-2 | |
| dc.subject.mesh | Spike Glycoprotein, Coronavirus | |
| dc.title | Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability | |
| dc.type | research article | |
| dc.type.hasVersion | VoR | |
| dspace.entity.type | Publication | |
| relation.isAuthorOfPublication | 891fcdfd-85a1-46ac-944b-ac6de54d3cc9 | |
| relation.isAuthorOfPublication | 7e48a263-54c3-4de0-81e7-f7446c918f2d | |
| relation.isAuthorOfPublication.latestForDiscovery | 891fcdfd-85a1-46ac-944b-ac6de54d3cc9 | |
| relation.isFunderOfPublication | c44e5f49-fdaf-4013-a3c5-dff820964030 | |
| relation.isFunderOfPublication.latestForDiscovery | c44e5f49-fdaf-4013-a3c5-dff820964030 | |
| relation.isPublisherOfPublication | a2759e3d-0d58-4e8a-9fcd-c6130ee333d1 | |
| relation.isPublisherOfPublication.latestForDiscovery | a2759e3d-0d58-4e8a-9fcd-c6130ee333d1 |
Files
Original bundle
1 - 3 of 3
Loading...
- Name:
- ClassificationSARS-CoV-2_Sequences_2024.pdf
- Size:
- 3.26 MB
- Format:
- Adobe Portable Document Format
Loading...
- Name:
- Suplementary1_ClassificationSARS-CoV-2_Sequences_2024.zip
- Size:
- 70.08 KB
- Format:
- Description:
- S1 Appendix. Complete results per subsampling. https://doi.org/10.1371/journal.pone.0309391.s001
Loading...
- Name:
- Suplementary2_ClassificationSARS-CoV-2_Sequences_2024.zip
- Size:
- 65.74 KB
- Format:
- Description:
- S2 Appendix. Complete results per number of epochs. https://doi.org/10.1371/journal.pone.0309391.s002


