A method for automatically extracting infectious disease-related primers and probes from the literature

García-Remesal, Miguel; Cuevas, Alejandro; López-Alonso, Victoria; Lopez-Campos, Guillermo; de la Calle, Guillermo; de la Iglesia, Diana; Pérez-Rey, David; Crespo, José; Martin-Sanchez, Fernando; Maojo, Víctor

Publication:
A method for automatically extracting infectious disease-related primers and probes from the literature

dc.contributor.author	García-Remesal, Miguel
dc.contributor.author	Cuevas, Alejandro
dc.contributor.author	López-Alonso, Victoria
dc.contributor.author	Lopez-Campos, Guillermo
dc.contributor.author	de la Calle, Guillermo
dc.contributor.author	de la Iglesia, Diana
dc.contributor.author	Pérez-Rey, David
dc.contributor.author	Crespo, José
dc.contributor.author	Martin-Sanchez, Fernando
dc.contributor.author	Maojo, Víctor
dc.contributor.funder	Unión Europea. Comisión Europea
dc.contributor.funder	Ministerio de Ciencia e Innovación (España)
dc.contributor.funder	Comunidad de Madrid (España)
dc.date.accessioned	2018-12-17T14:13:25Z
dc.date.available	2018-12-17T14:13:25Z
dc.date.issued	2010-08-03
dc.description.abstract	BACKGROUND: Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. RESULTS: We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. CONCLUSIONS: We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch.	es_ES
dc.description.peerreviewed	Sí	es_ES
dc.description.sponsorship	The present work has been funded, in part, by the European Commission through the ACGT integrated project (FP6-2005-IST-026996) and the ACTION-Grid support action (FP7-ICT-2007-2-224176), the Spanish Ministry of Science and Innovation through the OntoMineBase project (ref. TSI2006-13021-C02-01), the ImGraSec project (ref. TIN2007-61768), FIS/AES PS09/00069 and COMBIOMED-RETICS, and the Comunidad de Madrid, Spain.	es_ES
dc.format.number	1	es_ES
dc.format.page	410	es_ES
dc.format.volume	11	es_ES
dc.identifier.citation	BMC Bioinformatics. 2010 Aug 3;11:410.	es_ES
dc.identifier.doi	10.1186/1471-2105-11-410	es_ES
dc.identifier.e-issn	1471-2105	es_ES
dc.identifier.issn	1471-2105	es_ES
dc.identifier.journal	BMC bioinformatics	es_ES
dc.identifier.pubmedID	20682041	es_ES
dc.identifier.uri	http://hdl.handle.net/20.500.12105/6877
dc.language.iso	eng	es_ES
dc.publisher	BioMed Central (BMC)
dc.relation.projectID	info:eu-repo/grantAgreement/EC/FP6/2005-IST-026996	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/FP7/ICT-2007-2-224176	es_ES
dc.relation.publisherversion	https://doi.org/10.1186/1471-2105-11-410	es_ES
dc.repisalud.centro	ISCIII::Unidad Funcional de Investigación de Enfermedades Crónicas (UFIEC)	es_ES
dc.repisalud.institucion	ISCIII	es_ES
dc.rights.accessRights	open access	es_ES
dc.rights.license	Atribución 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	*
dc.subject.mesh	Base Sequence	es_ES
dc.subject.mesh	DNA Primers	es_ES
dc.subject.mesh	DNA Probes	es_ES
dc.subject.mesh	Periodicals as Topic	es_ES
dc.subject.mesh	Data Mining	es_ES
dc.subject.mesh	Databases, Genetic	es_ES
dc.title	A method for automatically extracting infectious disease-related primers and probes from the literature	es_ES
dc.type	research article	es_ES
dc.type.hasVersion	VoR	es_ES
dspace.entity.type	Publication
relation.isAuthorOfPublication	b280441c-10a0-4dbd-a8ca-8274086ba2a5
relation.isAuthorOfPublication	65be59ac-0fdd-4b4f-8e58-bcf6fc57d227
relation.isAuthorOfPublication	a40cb0cc-0d78-4079-8fbc-b24ee1798529
relation.isAuthorOfPublication.latestForDiscovery	b280441c-10a0-4dbd-a8ca-8274086ba2a5
relation.isFunderOfPublication	639cecfa-9455-4e7f-9f65-9ece17a878f0
relation.isFunderOfPublication	289dce42-6a28-4892-b0a8-c70c46cbb185
relation.isFunderOfPublication	c87c70a3-e023-4b6b-ac25-1b2d1b483786
relation.isFunderOfPublication.latestForDiscovery	639cecfa-9455-4e7f-9f65-9ece17a878f0
relation.isPublisherOfPublication	4fe896aa-347b-437b-a45b-95f4b60d9fd3
relation.isPublisherOfPublication.latestForDiscovery	4fe896aa-347b-437b-a45b-95f4b60d9fd3

Files

Original bundle

Now showing 1 - 1 of 1

Name:: AMethodForAutomatically_2010.pdf
Size:: 1.37 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Unidad Funcional de Investigación de Enfermedades Crónicas (UFIEC)

Publication: A method for automatically extracting infectious disease-related primers and probes from the literature

Files

Original bundle

Collections

Publication:
A method for automatically extracting infectious disease-related primers and probes from the literature