Publication:
Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability

dc.contributor.authorGuerrero-Tamayo, Ana
dc.contributor.authorSanz Urquijo, Borja
dc.contributor.authorOlivares, Isabel
dc.contributor.authorMoragues Tosantos, María-Dolores
dc.contributor.authorCasado, Concepcion
dc.contributor.authorPastor-López, Iker
dc.contributor.funderUniversidad de Deusto (España)
dc.date.accessioned2025-01-15T10:25:52Z
dc.date.available2025-01-15T10:25:52Z
dc.date.issued2024
dc.description.abstractThe global impact of the SARS-CoV-2 pandemic has underscored the need for a deeper understanding of viral evolution to anticipate new viruses or variants. Genetic recombination is a fundamental mechanism in viral evolution, yet it remains poorly understood. In this study, we conducted a comprehensive research on the genetic regions associated with genetic recombination features in SARS-CoV-2. With this aim, we implemented a two-phase transfer learning approach using genomic spectrograms of complete SARS-CoV-2 sequences. In the first phase, we utilized a pre-trained VGG-16 model with genomic spectrograms of HIV-1, and in the second phase, we applied HIV-1 VGG-16 model to SARS-CoV-2 spectrograms. The identification of key recombination hot zones was achieved using the Grad-CAM interpretability tool, and the results were analyzed by mathematical and image processing techniques. Our findings unequivocally identify the SARS-CoV-2 Spike protein (S protein) as the pivotal region in the genetic recombination feature. For non-recombinant sequences, the relevant frequencies clustered around 1/6 and 1/12. In recombinant sequences, the sharp prominence of the main hot zone in the Spike protein prominently indicated a frequency of 1/6. These findings suggest that in the arithmetic series, every 6 nucleotides (two triplets) in S may encode crucial information, potentially concealing essential details about viral characteristics, in this case, recombinant feature of a SARS-CoV-2 genetic sequence. This insight further underscores the potential presence of multifaceted information within the genome, including mathematical signatures that define an organism's unique attributes.
dc.description.peerreviewed
dc.description.sponsorshipThis work was supported by the Research Training Grants Program - University of Deusto: Ref. FPI UD_2021_10. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
dc.format.number8
dc.format.pagee0309391
dc.format.volume19
dc.identifier.citationGuerrero-Tamayo A, Sanz Urquijo B, Olivares I, Moragues Tosantos MD, Casado C, Pastor-López I. Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability. PLoS One. 2024 Aug 26;19(8):e0309391.
dc.identifier.doi10.1371/journal.pone.0309391
dc.identifier.e-issn1932-6203
dc.identifier.journalPloS one
dc.identifier.pubmedID39186542
dc.identifier.urihttps://hdl.handle.net/20.500.12105/26022
dc.language.isoeng
dc.publisherPublic Library of Science (PLOS)
dc.relation.publisherversionhttps://doi.org/10.1371/journal.pone.0309391
dc.repisalud.centroISCIII::Centro Nacional de Microbiología (CNM)
dc.repisalud.institucionISCIII
dc.rights.accessRightsopen access
dc.rights.licenseAttribution 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subject.meshCOVID-19
dc.subject.meshGenome, Viral
dc.subject.meshHIV-1
dc.subject.meshHumans
dc.subject.meshNeural Networks, Computer
dc.subject.meshRecombination, Genetic
dc.subject.meshSARS-CoV-2
dc.subject.meshSpike Glycoprotein, Coronavirus
dc.titleClassification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability
dc.typeresearch article
dc.type.hasVersionVoR
dspace.entity.typePublication
relation.isAuthorOfPublication891fcdfd-85a1-46ac-944b-ac6de54d3cc9
relation.isAuthorOfPublication7e48a263-54c3-4de0-81e7-f7446c918f2d
relation.isAuthorOfPublication.latestForDiscovery891fcdfd-85a1-46ac-944b-ac6de54d3cc9
relation.isFunderOfPublicationc44e5f49-fdaf-4013-a3c5-dff820964030
relation.isFunderOfPublication.latestForDiscoveryc44e5f49-fdaf-4013-a3c5-dff820964030
relation.isPublisherOfPublicationa2759e3d-0d58-4e8a-9fcd-c6130ee333d1
relation.isPublisherOfPublication.latestForDiscoverya2759e3d-0d58-4e8a-9fcd-c6130ee333d1

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
ClassificationSARS-CoV-2_Sequences_2024.pdf
Size:
3.26 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
Suplementary1_ClassificationSARS-CoV-2_Sequences_2024.zip
Size:
70.08 KB
Format:
Description:
S1 Appendix. Complete results per subsampling. https://doi.org/10.1371/journal.pone.0309391.s001
Loading...
Thumbnail Image
Name:
Suplementary2_ClassificationSARS-CoV-2_Sequences_2024.zip
Size:
65.74 KB
Format:
Description:
S2 Appendix. Complete results per number of epochs. https://doi.org/10.1371/journal.pone.0309391.s002