Publication: Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability
Loading...
Identifiers
Publication date
Advisors
Journal Title
Journal ISSN
Volume Title
Publishers
Metrics
Abstract
The global impact of the SARS-CoV-2 pandemic has underscored the need for a deeper understanding of viral evolution to anticipate new viruses or variants. Genetic recombination is a fundamental mechanism in viral evolution, yet it remains poorly understood. In this study, we conducted a comprehensive research on the genetic regions associated with genetic recombination features in SARS-CoV-2. With this aim, we implemented a two-phase transfer learning approach using genomic spectrograms of complete SARS-CoV-2 sequences. In the first phase, we utilized a pre-trained VGG-16 model with genomic spectrograms of HIV-1, and in the second phase, we applied HIV-1 VGG-16 model to SARS-CoV-2 spectrograms. The identification of key recombination hot zones was achieved using the Grad-CAM interpretability tool, and the results were analyzed by mathematical and image processing techniques. Our findings unequivocally identify the SARS-CoV-2 Spike protein (S protein) as the pivotal region in the genetic recombination feature. For non-recombinant sequences, the relevant frequencies clustered around 1/6 and 1/12. In recombinant sequences, the sharp prominence of the main hot zone in the Spike protein prominently indicated a frequency of 1/6. These findings suggest that in the arithmetic series, every 6 nucleotides (two triplets) in S may encode crucial information, potentially concealing essential details about viral characteristics, in this case, recombinant feature of a SARS-CoV-2 genetic sequence. This insight further underscores the potential presence of multifaceted information within the genome, including mathematical signatures that define an organism's unique attributes.
Description
Keywords
DeCS Terms
Bibliographic citation
Guerrero-Tamayo A, Sanz Urquijo B, Olivares I, Moragues Tosantos MD, Casado C, Pastor-López I. Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability. PLoS One. 2024 Aug 26;19(8):e0309391.








