Publication:
Transformers for Clinical Coding in Spanish

dc.contributor.authorLópez-García, Guillermo
dc.contributor.authorJerez, José M.
dc.contributor.authorRibelles, Nuria
dc.contributor.authorAlba, Emilio
dc.contributor.authorVeredas, Francisco J.
dc.contributor.authoraffiliation[López-García,G; Jerez,JM; Veredas,FJ] Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain. [Ribelles,N; Alba,E] Unidad de Gestión Clínica Intercentros de Oncología, Instituto de Investigación Biomédica de Málaga (IBIMA), Hospitales Universitarios Regional y Virgen de la Victoria, Málaga, Spain.
dc.date.accessioned2024-02-19T15:28:18Z
dc.date.available2024-02-19T15:28:18Z
dc.date.issued2021-05-13
dc.description.abstractAutomatic clinical coding is an essential task in the process of extracting relevant information from unstructured documents contained in electronic health records (EHRs). However, most research in the development of computer-based methods for clinical coding focuses on texts written in English due to the limited availability of medical linguistic resources in languages other than English. With nearly 500 million native speakers, there is a worldwide interest in processing healthcare texts in Spanish. In this study, we sys tematically analyzed transformer-based models for automatic clinical coding in Spanish. Using a transfer learning-based approach, the three existing transformer architectures that support the Spanish language, namely, multilingual BERT (mBERT), BETO and XLM-RoBERTa (XLM-R), were first pretrained on a corpus of real-world oncology clinical cases with the goal of adapting transformers to the particularities of Spanish medical texts. The resulting models were fine-tuned on three distinct clinical coding tasks, following a multilabel sentence classification strategy. For each analyzed transformer, the domain-specific version out performed the original general domain model across those tasks. Moreover, the combination of the developed strategy with an ensemble approach leveraging the predictive capacities of the three distinct transformers yielded the best obtained results, with MAP scores of 0.662, 0.544 and 0.884 on CodiEsp-D, CodiEsp-P and Cantemist-Coding shared tasks, which remarkably improved the previous state-of-the-art performance by 11.6%, 10.3% and 4.4%, respectively. We publicly release the mBERT, BETO and XLMR transform ers adapted to the Spanish clinical domain at https://github.com/guilopgar/ClinicalCodingTransformerES, providing the clinical natural language processing community with advanced deep learning methods for performing medical coding and other tasks in the Spanish clinical domain.
dc.description.sponsorshipThis work was supported in part by the Ministerio de Economía y Empresa (MINECO), Plan Nacional de I+D+I, under Project TIN2017-88728-C2-1-R, in part by the Andalucía TECH, under Project UMA-CEIATECH-01, in part by the Universidad de Málaga and the Consorcio de Bibliotecas Universitarias de Andalucía (CBUA), and in part by the Plan Andaluz de Investigación, Desarrollo e Innovación (PAIDI), Junta de Andalucía.
dc.identifier.doi10.1109/ACCESS.2021.3080085
dc.identifier.e-issn2169-3536es_ES
dc.identifier.journalIEEE Accesses_ES
dc.identifier.otherhttp://hdl.handle.net/10668/3836
dc.identifier.urihttp://hdl.handle.net/20.500.12105/18344
dc.language.isoeng
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.relation.publisherversionhttps://ieeexplore.ieee.org/document/9430499es
dc.rights.accessRightsopen accesses_ES
dc.rights.licenseAttribution 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subjectClinical coding
dc.subjectDeep learning
dc.subjectNatural language processing
dc.subjectText classification
dc.subjectTransformers
dc.subjectCodificación clínica
dc.subjectAprendizaje profundo
dc.subjectProcesamiento de lenguaje natural
dc.subjectModelos epidemiológicos
dc.subjectAnálisis y desempeño de tareas
dc.subject.meshNatural Language Processing
dc.subject.meshElectronic Health Records
dc.subject.meshClinical Coding
dc.subject.meshLinguistics
dc.subject.meshDelivery of Health Care
dc.subject.meshComputers
dc.subject.meshGoals
dc.titleTransformers for Clinical Coding in Spanish
dc.typeresearch article
dc.type.hasVersionVoR
dspace.entity.typePublication
relation.isPublisherOfPublicatione659049e-3838-4139-986e-5d94c40668b3
relation.isPublisherOfPublication.latestForDiscoverye659049e-3838-4139-986e-5d94c40668b3

Files