Publication:
A method for automatically extracting infectious disease-related primers and probes from the literature

dc.contributor.authorGarcía-Remesal, Miguel
dc.contributor.authorCuevas, Alejandro
dc.contributor.authorLópez-Alonso, Victoria
dc.contributor.authorLopez-Campos, Guillermo
dc.contributor.authorde la Calle, Guillermo
dc.contributor.authorde la Iglesia, Diana
dc.contributor.authorPérez-Rey, David
dc.contributor.authorCrespo, José
dc.contributor.authorMartin-Sanchez, Fernando
dc.contributor.authorMaojo, Víctor
dc.contributor.funderUnión Europea. Comisión Europea
dc.contributor.funderMinisterio de Ciencia e Innovación (España)
dc.contributor.funderComunidad de Madrid (España)
dc.date.accessioned2018-12-17T14:13:25Z
dc.date.available2018-12-17T14:13:25Z
dc.date.issued2010-08-03
dc.description.abstractBACKGROUND: Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. RESULTS: We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. CONCLUSIONS: We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch.es_ES
dc.description.peerreviewedes_ES
dc.description.sponsorshipThe present work has been funded, in part, by the European Commission through the ACGT integrated project (FP6-2005-IST-026996) and the ACTION-Grid support action (FP7-ICT-2007-2-224176), the Spanish Ministry of Science and Innovation through the OntoMineBase project (ref. TSI2006-13021-C02-01), the ImGraSec project (ref. TIN2007-61768), FIS/AES PS09/00069 and COMBIOMED-RETICS, and the Comunidad de Madrid, Spain.es_ES
dc.format.number1es_ES
dc.format.page410es_ES
dc.format.volume11es_ES
dc.identifier.citationBMC Bioinformatics. 2010 Aug 3;11:410.es_ES
dc.identifier.doi10.1186/1471-2105-11-410es_ES
dc.identifier.e-issn1471-2105es_ES
dc.identifier.issn1471-2105es_ES
dc.identifier.journalBMC bioinformaticses_ES
dc.identifier.pubmedID20682041es_ES
dc.identifier.urihttp://hdl.handle.net/20.500.12105/6877
dc.language.isoenges_ES
dc.publisherBioMed Central (BMC)
dc.relation.projectIDinfo:eu-repo/grantAgreement/EC/FP6/2005-IST-026996es_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/EC/FP7/ICT-2007-2-224176es_ES
dc.relation.publisherversionhttps://doi.org/10.1186/1471-2105-11-410es_ES
dc.repisalud.centroISCIII::Unidad Funcional de Investigación de Enfermedades Crónicas (UFIEC)es_ES
dc.repisalud.institucionISCIIIes_ES
dc.rights.accessRightsopen accesses_ES
dc.rights.licenseAtribución 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subject.meshBase Sequencees_ES
dc.subject.meshDNA Primerses_ES
dc.subject.meshDNA Probeses_ES
dc.subject.meshPeriodicals as Topices_ES
dc.subject.meshData Mininges_ES
dc.subject.meshDatabases, Genetices_ES
dc.titleA method for automatically extracting infectious disease-related primers and probes from the literaturees_ES
dc.typeresearch articlees_ES
dc.type.hasVersionVoRes_ES
dspace.entity.typePublication
relation.isAuthorOfPublicationb280441c-10a0-4dbd-a8ca-8274086ba2a5
relation.isAuthorOfPublication65be59ac-0fdd-4b4f-8e58-bcf6fc57d227
relation.isAuthorOfPublicationa40cb0cc-0d78-4079-8fbc-b24ee1798529
relation.isAuthorOfPublication.latestForDiscoveryb280441c-10a0-4dbd-a8ca-8274086ba2a5
relation.isFunderOfPublication639cecfa-9455-4e7f-9f65-9ece17a878f0
relation.isFunderOfPublication289dce42-6a28-4892-b0a8-c70c46cbb185
relation.isFunderOfPublicationc87c70a3-e023-4b6b-ac25-1b2d1b483786
relation.isFunderOfPublication.latestForDiscovery639cecfa-9455-4e7f-9f65-9ece17a878f0
relation.isPublisherOfPublication4fe896aa-347b-437b-a45b-95f4b60d9fd3
relation.isPublisherOfPublication.latestForDiscovery4fe896aa-347b-437b-a45b-95f4b60d9fd3

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
AMethodForAutomatically_2010.pdf
Size:
1.37 MB
Format:
Adobe Portable Document Format
Description: