Publication:
TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse

dc.contributor.authorPedrera-Jiménez, Miguel
dc.contributor.authorGarcía-Barrio, Noelia
dc.contributor.authorRubio-Mayo, Paula
dc.contributor.authorTato-Gómez, Alberto
dc.contributor.authorCruz-Bermúdez, Juan Luis
dc.contributor.authorBernal-Sobrino, José Luis
dc.contributor.authorMuñoz Carrero, Adolfo
dc.contributor.authorSerrano-Balazote, Pablo
dc.contributor.funderMinisterio de Economía y Competitividad (España)
dc.contributor.funderInstituto de Salud Carlos III
dc.date.accessioned2023-04-27T11:02:48Z
dc.date.available2023-04-27T11:02:48Z
dc.date.issued2022-12
dc.description.abstractBackground: During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes are often based on black boxes, on which clinical researchers are unaware of how the data were recorded, extracted, and transformed. In order to solve this, it is essential that extract, transform, and load (ETL) processes are based on transparent, homogeneous, and formal methodologies, making them understandable, reproducible, and auditable. Objectives: This study aims to design and implement a methodology, according with FAIR Principles, for building ETL processes (focused on data extraction, selection, and transformation) for EHR reuse in a transparent and flexible manner, applicable to any clinical condition and health care organization. Methods: The proposed methodology comprises four stages: (1) analysis of secondary use models and identification of data operations, based on internationally used clinical repositories, case report forms, and aggregated datasets; (2) modeling and formalization of data operations, through the paradigm of the Detailed Clinical Models; (3) agnostic development of data operations, selecting SQL and R as programming languages; and (4) automation of the ETL instantiation, building a formal configuration file with XML. Results: First, four international projects were analyzed to identify 17 operations, necessary to obtain datasets according to the specifications of these projects from the EHR. With this, each of the data operations was formalized, using the ISO 13606 reference model, specifying the valid data types as arguments, inputs and outputs, and their cardinality. Then, an agnostic catalog of data was developed through data-oriented programming languages previously selected. Finally, an automated ETL instantiation process was built from an ETL configuration file formally defined. Conclusions: This study has provided a transparent and flexible solution to the difficulty of making the processes for obtaining EHR-derived data for secondary use understandable, auditable, and reproducible. Moreover, the abstraction carried out in this study means that any previous EHR reuse methodology can incorporate these results into them.es_ES
dc.description.peerreviewedes_ES
dc.description.sponsorshipMinisterio de Economía y Competitividad Instituto de Salud Carlos III PI18/00981, PI18/01047, PI18CIII/00019.es_ES
dc.format.numberS02es_ES
dc.format.pagee89-e102es_ES
dc.format.volume61es_ES
dc.identifier.citationMethods Inf Med. 2022 Dec;61(S02):e89-e102.es_ES
dc.identifier.doi10.1055/s-0042-1757763es_ES
dc.identifier.e-issn2511-705Xes_ES
dc.identifier.journalMethods of information in medicinees_ES
dc.identifier.pubmedID36220109es_ES
dc.identifier.urihttp://hdl.handle.net/20.500.12105/15919
dc.language.isoenges_ES
dc.publisherThieme Medical Publishers
dc.relation.projectFISinfo:fis/Instituto de Salud Carlos III/Programa Estatal de Fomento de la Investigación Científica y Técnica de Excelencia/Subprograma Estatal de Generación de Conocimiento/PI18 - Proyectos de investigacion en salud (AES 2018). Modalidad proyectos en salud. (2018)/PI18/00981es_ES
dc.relation.projectFISinfo:fis/Instituto de Salud Carlos III/Programa Estatal de Fomento de la Investigación Científica y Técnica de Excelencia/Subprograma Estatal de Generación de Conocimiento/PI18 - Proyectos de investigacion en salud (AES 2018). Modalidad proyectos en salud. (2018)/PI18/01047es_ES
dc.relation.projectFISinfo:fis/Instituto de Salud Carlos III/Programa Estatal de Fomento de la Investigación Científica y Técnica de Excelencia/Subprograma Estatal de Generación de Conocimiento/PI18-ISCIII Modalidad Proyectos de Investigacion en Salud Intramurales. (2018)/PI18CIII/00019es_ES
dc.relation.publisherversionhttps://doi.org/10.1055/s-0042-1757763es_ES
dc.repisalud.centroISCIII::Unidad de Investigación en Telemedicina y eSaludes_ES
dc.repisalud.institucionISCIIIes_ES
dc.rights.accessRightsopen accesses_ES
dc.rights.licenseAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectElectronic health recordes_ES
dc.subjectFAIR Principleses_ES
dc.subjectData reusabilityes_ES
dc.subjectReal-world dataes_ES
dc.subjectStandardses_ES
dc.subject.meshElectronic Health Recordses_ES
dc.subject.meshCOVID-19es_ES
dc.subject.meshHumanses_ES
dc.subject.meshPandemicses_ES
dc.titleTransformEHRs: a flexible methodology for building transparent ETL processes for EHR reusees_ES
dc.typeresearch articlees_ES
dc.type.hasVersionVoRes_ES
dspace.entity.typePublication
relation.isAuthorOfPublicationc62651ac-034c-4271-b51e-d82a428af13e
relation.isAuthorOfPublication.latestForDiscoveryc62651ac-034c-4271-b51e-d82a428af13e
relation.isFunderOfPublication77b2fc20-6311-4e46-98a7-83e46257b93b
relation.isFunderOfPublication7d739953-4b68-4675-b5bb-387a9ab74b66
relation.isFunderOfPublication.latestForDiscovery77b2fc20-6311-4e46-98a7-83e46257b93b
relation.isPublisherOfPublication4e9125cc-4680-4c62-a61d-b8051bb93403
relation.isPublisherOfPublication.latestForDiscovery4e9125cc-4680-4c62-a61d-b8051bb93403

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TransformEHRsFlexibleMethodology_2022.pdf
Size:
2.67 MB
Format:
Adobe Portable Document Format
Description: