<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-04-29T19:34:53Z</responseDate><request verb="GetRecord" identifier="oai:repisalud.isciii.es:20.500.12105/26783" metadataPrefix="mets">https://repisalud.isciii.es/rest/oai/request</request><GetRecord><record><header><identifier>oai:repisalud.isciii.es:20.500.12105/26783</identifier><datestamp>2025-12-18T13:01:53Z</datestamp><setSpec>com_20.500.12105_2052</setSpec><setSpec>com_20.500.12105_2051</setSpec><setSpec>col_20.500.12105_19619</setSpec></header><metadata><mets xmlns="http://www.loc.gov/METS/" xmlns:doc="http://www.lyncode.com/xoai" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ID="&#xa;&#x9;&#x9;&#x9;&#x9;DSpace_ITEM_20.500.12105-26783" TYPE="DSpace ITEM" PROFILE="DSpace METS SIP Profile 1.0" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd" OBJID="&#xa;&#x9;&#x9;&#x9;&#x9;hdl:20.500.12105/26783">
   <metsHdr CREATEDATE="2026-04-29T21:34:53Z">
      <agent ROLE="CUSTODIAN" TYPE="ORGANIZATION">
         <name>Repisalud</name>
      </agent>
   </metsHdr>
   <dmdSec ID="DMD_20.500.12105_26783">
      <mdWrap MDTYPE="MODS">
         <xmlData xmlns:mods="http://www.loc.gov/mods/v3" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd">
            <mods:mods xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd">
               <mods:name>
                  <mods:role>
                     <mods:roleTerm type="text">author</mods:roleTerm>
                  </mods:role>
                  <mods:namePart>Sánchez-de-Madariaga, Ricardo</mods:namePart>
               </mods:name>
               <mods:name>
                  <mods:role>
                     <mods:roleTerm type="text">author</mods:roleTerm>
                  </mods:role>
                  <mods:namePart>Pascual-Carrasco, Mario</mods:namePart>
               </mods:name>
               <mods:name>
                  <mods:role>
                     <mods:roleTerm type="text">author</mods:roleTerm>
                  </mods:role>
                  <mods:namePart>Muñoz Carrero, Adolfo</mods:namePart>
               </mods:name>
               <mods:extension>
                  <mods:dateAccessioned encoding="iso8601">2025-07-01T06:32:56Z</mods:dateAccessioned>
               </mods:extension>
               <mods:extension>
                  <mods:dateAvailable encoding="iso8601">2025-07-01T06:32:56Z</mods:dateAvailable>
               </mods:extension>
               <mods:originInfo>
                  <mods:dateIssued encoding="iso8601">2025-05-28</mods:dateIssued>
               </mods:originInfo>
               <mods:identifier type="citation">Sánchez-de-Madariaga, R.; Pascual Carrasco, M.; Muñoz Carrero, A. A Methodology to Extract Knowledge from Datasets Using ML. Mathematics. 2025. 13(11):1807.</mods:identifier>
               <mods:identifier type="doi">10.3390/math13111807</mods:identifier>
               <mods:identifier type="issn">2227-7390</mods:identifier>
               <mods:identifier type="journal">Mathematics</mods:identifier>
               <mods:identifier type="uri">https://hdl.handle.net/20.500.12105/26783</mods:identifier>
               <mods:abstract>This study aims to verify whether there is any relationship between the different classification outputs produced by distinct ML algorithms and the relevance of the data they classify, to address the problem of knowledge extraction (KE) from datasets. If such a relationship exists, the main objective of this research is to use it in order to improve performance in the important task of KE from datasets. A new dataset generation and a new ML classification measurement methodology were developed to determine whether the feature subsets (FSs) best classified by a specific ML algorithm corresponded to the most KE-relevant combinations of features. Medical expertise was extracted to determine the knowledge relevance using two LLMs, namely, chat GPT-4o and Google Gemini 2.5. Some specific ML algorithms fit much better than others for a working dataset extracted from a given probability distribution. They best classify FSs that contain combinations of features that are particularly knowledge-relevant. This implies that, by using a specific ML algorithm, we can indeed extract useful scientific knowledge. The best-fitting ML algorithm is not known a priori. However, we can bootstrap its identity using a small amount of medical expertise, and we have a powerful tool for extracting (medical) knowledge from datasets using ML.</mods:abstract>
               <mods:language>
                  <mods:languageTerm authority="rfc3066">eng</mods:languageTerm>
               </mods:language>
               <mods:accessCondition type="useAndReproduction"/>
               <mods:subject>
                  <mods:topic>Knowledge relevance</mods:topic>
               </mods:subject>
               <mods:subject>
                  <mods:topic>Knowledge extraction</mods:topic>
               </mods:subject>
               <mods:subject>
                  <mods:topic>Feature subset</mods:topic>
               </mods:subject>
               <mods:subject>
                  <mods:topic>Large language models</mods:topic>
               </mods:subject>
               <mods:subject>
                  <mods:topic>Machine learning algorithms</mods:topic>
               </mods:subject>
               <mods:subject>
                  <mods:topic>Statistics</mods:topic>
               </mods:subject>
               <mods:titleInfo>
                  <mods:title>A Methodology to Extract Knowledge from Datasets Using ML</mods:title>
               </mods:titleInfo>
               <mods:genre>research article</mods:genre>
            </mods:mods>
         </xmlData>
      </mdWrap>
   </dmdSec>
   <amdSec ID="TMD_20.500.12105_26783">
      <rightsMD ID="RIG_20.500.12105_26783">
         <mdWrap MIMETYPE="text/plain" MDTYPE="OTHER" OTHERMDTYPE="DSpaceDepositLicense">
            <binData>QWNlcHRhbmRvIGVzdGEgbGljZW5jaWEsIFVzdGVkIChlbCBhdXRvci9lcyBvIGVsIHByb3BpZXRhcmlvL3MgZGUgbG9zIGRlcmVjaG9zIGRlIGF1dG9yKSBjb25jZWRlIGEgUkVQSVNBTFVEIGVsIGRlcmVjaG8gbm8gZXhjbHVzaXZvIGRlIHJlcHJvZHVjaXIsIGNvbnZlcnRpciwgeS9vIGRpc3RyaWJ1aXIgc3UgZG9jdW1lbnRvIChpbmNsdXllbmRvIHN1IHJlc3VtZW4pIGEgbml2ZWwgbXVuZGlhbCBlbiBmb3JtYXRvIGRpZ2l0YWwsIGluY2x1eWVuZG8sIGF1ZGlvIHkgdsOtZGVvLCBhIHRyYXbDqXMgZGUgc3UgcmVwb3NpdG9yaW8gaW5zdGl0dWNpb25hbC4KClVzdGVkIGFjZXB0YSBxdWUgUkVQSVNBTFVEIHB1ZWRlLCBzaW4gYWx0ZXJhciBzdSBjb250ZW5pZG8sIGNvbnZlcnRpciBzdSBkb2N1bWVudG8gYSBjdWFscXVpZXIgb3RybyBmb3JtYXRvIGRpZ2l0YWwgZGUgZGF0b3MsIGF1ZGlvIHkgdmlkZW8sIGNvbiBlbCBwcm9ww7NzaXRvIGRlIHF1ZSBwdWVkYSBzZXIgYWxvamFkbyBlbiBlbCByZXBvc2l0b3Jpby4KClVzdGVkIGVzdMOhIGRlIGFjdWVyZG8gY29uIHF1ZSBSRVBJU0FMVUQgcHVlZGEgY29uc2VydmFyIG3DoXMgZGUgdW5hIGNvcGlhIGRlIGVzdGUgZG9jdW1lbnRvIHBhcmEgYXNlZ3VyYXIgc3Ugc2VndXJpZGFkLCBwcmVzZXJ2YWNpw7NuIHkgYWNjZXNvLgoKVXN0ZWQgZGVjbGFyYSBxdWUgZWwgZG9jdW1lbnRvIGVzIHVuIHRyYWJham8gb3JpZ2luYWwsIHkgcXVlIHRpZW5lIGVsIGRlcmVjaG8gZGUgb3RvcmdhciBsb3MgZGVyZWNob3MgY29udGVuaWRvcyBlbiBlc3RhIGxpY2VuY2lhLiBUYW1iacOpbiBkZWNsYXJhIHF1ZSBzdSBwZXRpY2nDs24gbm8gaW5mcmluZ2UgbG9zIGRlcmVjaG9zIGRlIGF1dG9yIGRlIG5hZGllLgoKU2kgZWwgZG9jdW1lbnRvIGNvbnRpZW5lIG1hdGVyaWFsZXMgcGFyYSBsb3MgcXVlIG5vIHNlIHRpZW5lbiBsb3MgZGVyZWNob3MgZGUgYXV0b3IsIHVzdGVkIGRlY2xhcmEgcXVlIGhhIG9idGVuaWRvIGVsIHBlcm1pc28gc2luIHJlc3RyaWNjacOzbiBkZWwgcHJvcGlldGFyaW8gZGUgbG9zIGRlcmVjaG9zIHkgcXVlIGVuIGRpY2hvIG1hdGVyaWFsLCBlc3TDoSBjbGFyYW1lbnRlIGlkZW50aWZpY2FkYSB5IHJlY29ub2NpZGEgc3UgYXV0b3LDrWEgZGVudHJvIGVsIHRleHRvIG8gZGVsIGNvbnRlbmlkbyBkZSBkaWNobyBkb2N1bWVudG8uCgpTaSBlbCBlbnbDrW8gc2UgYmFzYSBlbiB1biB0cmFiYWpvIHF1ZSBoYSBzaWRvIHBhdHJvY2luYWRvIG8gYXBveWFkbyBwb3IgdW5hIGFnZW5jaWEgdSBvcmdhbml6YWNpw7NuIGRpc3RpbnRhIGEgUkVQSVNBTFVELCB1c3RlZCBhY2VwdGEgcXVlIGhhIGN1bXBsaWRvIGNvbiBlbCBkZXJlY2hvIGRlIHJldmlzacOzbiB5IG90cmFzIG9ibGlnYWNpb25lcyByZXF1ZXJpZGFzIHBvciBjb250cmF0byBvIGFjdWVyZG8uCgpSRVBJU0FMVUQgaWRlbnRpZmljYXLDoSBjbGFyYW1lbnRlIHN1KHMpIG5vbWJyZShzKSBjb21vIGF1dG9yKHMpIG8gcHJvcGlldGFyaW8ocykgZGVsIGRvY3VtZW50bywgeSBubyBoYXLDoSBuaW5ndW5hIGFsdGVyYWNpw7NuLCBleGNlcHRvIHNlZ8O6biBsbyBwZXJtaXRpZG8gcG9yIGVzdGEgbGljZW5jaWEuCg==</binData>
         </mdWrap>
      </rightsMD>
   </amdSec>
   <amdSec ID="FO_20.500.12105_26783_1">
      <techMD ID="TECH_O_20.500.12105_26783_1">
         <mdWrap MDTYPE="PREMIS">
            <xmlData xmlns:premis="http://www.loc.gov/standards/premis" xsi:schemaLocation="http://www.loc.gov/standards/premis http://www.loc.gov/standards/premis/PREMIS-v1-0.xsd">
               <premis:premis>
                  <premis:object>
                     <premis:objectIdentifier>
                        <premis:objectIdentifierType>URL</premis:objectIdentifierType>
                        <premis:objectIdentifierValue>https://repisalud.isciii.es/bitstreams/b40e35bc-e96f-40de-a0bb-b31bfd4d28d6/download</premis:objectIdentifierValue>
                     </premis:objectIdentifier>
                     <premis:objectCategory>File</premis:objectCategory>
                     <premis:objectCharacteristics>
                        <premis:fixity>
                           <premis:messageDigestAlgorithm>MD5</premis:messageDigestAlgorithm>
                           <premis:messageDigest>3c2a57208587ec60fbf717588602b54f</premis:messageDigest>
                        </premis:fixity>
                        <premis:size>1067668</premis:size>
                        <premis:format>
                           <premis:formatDesignation>
                              <premis:formatName>application/pdf</premis:formatName>
                           </premis:formatDesignation>
                        </premis:format>
                     </premis:objectCharacteristics>
                     <premis:originalName>MethodologyExtractKnowledgeDatasets_2025.pdf</premis:originalName>
                  </premis:object>
               </premis:premis>
            </xmlData>
         </mdWrap>
      </techMD>
   </amdSec>
   <amdSec ID="FO_20.500.12105_26783_2">
      <techMD ID="TECH_O_20.500.12105_26783_2">
         <mdWrap MDTYPE="PREMIS">
            <xmlData xmlns:premis="http://www.loc.gov/standards/premis" xsi:schemaLocation="http://www.loc.gov/standards/premis http://www.loc.gov/standards/premis/PREMIS-v1-0.xsd">
               <premis:premis>
                  <premis:object>
                     <premis:objectIdentifier>
                        <premis:objectIdentifierType>URL</premis:objectIdentifierType>
                        <premis:objectIdentifierValue>https://repisalud.isciii.es/bitstreams/a72206b7-a7da-4cd2-99f4-637e3a2d7f7c/download</premis:objectIdentifierValue>
                     </premis:objectIdentifier>
                     <premis:objectCategory>File</premis:objectCategory>
                     <premis:objectCharacteristics>
                        <premis:fixity>
                           <premis:messageDigestAlgorithm>MD5</premis:messageDigestAlgorithm>
                           <premis:messageDigest>ba8f9182bac1f80d8035bf6ea5e0bdbf</premis:messageDigest>
                        </premis:fixity>
                        <premis:size>254598</premis:size>
                        <premis:format>
                           <premis:formatDesignation>
                              <premis:formatName>application/pdf</premis:formatName>
                           </premis:formatDesignation>
                        </premis:format>
                     </premis:objectCharacteristics>
                     <premis:originalName>Supplementary1_MethodologyExtractKnowledgeDatasets_2025.pdf</premis:originalName>
                  </premis:object>
               </premis:premis>
            </xmlData>
         </mdWrap>
      </techMD>
   </amdSec>
   <amdSec ID="FO_20.500.12105_26783_3">
      <techMD ID="TECH_O_20.500.12105_26783_3">
         <mdWrap MDTYPE="PREMIS">
            <xmlData xmlns:premis="http://www.loc.gov/standards/premis" xsi:schemaLocation="http://www.loc.gov/standards/premis http://www.loc.gov/standards/premis/PREMIS-v1-0.xsd">
               <premis:premis>
                  <premis:object>
                     <premis:objectIdentifier>
                        <premis:objectIdentifierType>URL</premis:objectIdentifierType>
                        <premis:objectIdentifierValue>https://repisalud.isciii.es/bitstreams/8a3e1376-c534-4b22-b7b1-de28a658a52f/download</premis:objectIdentifierValue>
                     </premis:objectIdentifier>
                     <premis:objectCategory>File</premis:objectCategory>
                     <premis:objectCharacteristics>
                        <premis:fixity>
                           <premis:messageDigestAlgorithm>MD5</premis:messageDigestAlgorithm>
                           <premis:messageDigest>f70121fee91f633ba85deb975d8cf180</premis:messageDigest>
                        </premis:fixity>
                        <premis:size>123356</premis:size>
                        <premis:format>
                           <premis:formatDesignation>
                              <premis:formatName>application/pdf</premis:formatName>
                           </premis:formatDesignation>
                        </premis:format>
                     </premis:objectCharacteristics>
                     <premis:originalName>Supplementary2_MethodologyExtractKnowledgeDatasets_2025.pdf</premis:originalName>
                  </premis:object>
               </premis:premis>
            </xmlData>
         </mdWrap>
      </techMD>
   </amdSec>
   <fileSec>
      <fileGrp USE="ORIGINAL">
         <file ID="BITSTREAM_ORIGINAL_20.500.12105_26783_1" MIMETYPE="application/pdf" SEQ="1" SIZE="1067668" CHECKSUM="3c2a57208587ec60fbf717588602b54f" CHECKSUMTYPE="MD5" ADMID="FO_20.500.12105_26783_1" GROUPID="GROUP_BITSTREAM_20.500.12105_26783_1">
            <FLocat LOCTYPE="URL" xlink:type="simple" xlink:href="https://repisalud.isciii.es/bitstreams/b40e35bc-e96f-40de-a0bb-b31bfd4d28d6/download"/>
         </file>
         <file ID="BITSTREAM_ORIGINAL_20.500.12105_26783_2" MIMETYPE="application/pdf" SEQ="2" SIZE="254598" CHECKSUM="ba8f9182bac1f80d8035bf6ea5e0bdbf" CHECKSUMTYPE="MD5" ADMID="FO_20.500.12105_26783_2" GROUPID="GROUP_BITSTREAM_20.500.12105_26783_2">
            <FLocat LOCTYPE="URL" xlink:type="simple" xlink:href="https://repisalud.isciii.es/bitstreams/a72206b7-a7da-4cd2-99f4-637e3a2d7f7c/download"/>
         </file>
         <file ID="BITSTREAM_ORIGINAL_20.500.12105_26783_3" MIMETYPE="application/pdf" SEQ="3" SIZE="123356" CHECKSUM="f70121fee91f633ba85deb975d8cf180" CHECKSUMTYPE="MD5" ADMID="FO_20.500.12105_26783_3" GROUPID="GROUP_BITSTREAM_20.500.12105_26783_3">
            <FLocat LOCTYPE="URL" xlink:type="simple" xlink:href="https://repisalud.isciii.es/bitstreams/8a3e1376-c534-4b22-b7b1-de28a658a52f/download"/>
         </file>
      </fileGrp>
   </fileSec>
   <structMap LABEL="DSpace Object" TYPE="LOGICAL">
      <div TYPE="DSpace Object Contents" ADMID="DMD_20.500.12105_26783">
         <div TYPE="DSpace BITSTREAM">
            <fptr FILEID="BITSTREAM_ORIGINAL_20.500.12105_26783_1"/>
         </div>
         <div TYPE="DSpace BITSTREAM">
            <fptr FILEID="BITSTREAM_ORIGINAL_20.500.12105_26783_2"/>
         </div>
         <div TYPE="DSpace BITSTREAM">
            <fptr FILEID="BITSTREAM_ORIGINAL_20.500.12105_26783_3"/>
         </div>
      </div>
   </structMap>
</mets></metadata></record></GetRecord></OAI-PMH>