Bitte benutzen Sie diese Referenz, um auf diese Ressource zu verweisen: doi:10.22028/D291-39832
Titel: Enabling ad-hoc reuse of private data repositories through schema extraction
VerfasserIn: Gleim, Lars Christoph
Karim, Md Rezaul
Zimmermann, Lukas
Kohlbacher, Oliver
Stenzhorn, Holger
Decker, Stefan
Beyan, Oya
Sprache: Englisch
Titel: Journal of biomedical semantics
Bandnummer: 11
Heft: 1
Verlag/Plattform: BioMed Central
Erscheinungsjahr: 2020
Freie Schlagwörter: Semantic web
Linked data
RDF
SPARQL
Schema extraction
Privacy
Data access
Distributed systems
Query design
Personal health train
FAIR data
DDC-Sachgruppe: 610 Medizin, Gesundheit
Dokumenttyp: Journalartikel / Zeitschriftenartikel
Abstract: Sharing sensitive data across organizational boundaries is often significantly limited by legal and ethical restrictions. Regulations such as the EU General Data Protection Rules (GDPR) impose strict requirements concerning the protection of personal and privacy sensitive data. Therefore new approaches, such as the Personal Health Train initiative, are emerging to utilize data right in their original repositories, circumventing the need to transfer data.Circumventing limitations of previous systems, this paper proposes a configurable and automated schema extraction and publishing approach, which enables ad-hoc SPARQL query formulation against RDF triple stores without requiring direct access to the private data. The approach is compatible with existing Semantic Web-based technologies and allows for the subsequent execution of such queries in a safe setting under the data provider’s control. Evaluation with four distinct datasets shows that a configurable amount of concise and task-relevant schema, closely describing the structure of the underlying data, was derived, enabling the schema introspection-assisted authoring of SPARQL queries. Automatically extracting and publishing data schema can enable the introspection-assisted creation of data selection and integration queries. In conjunction with the presented system architecture, this approach can enable reuse of data from private repositories and in settings where agreeing upon a shared schema and encoding a priori is infeasible. As such, it could provide an important step towards reuse of data from previously inaccessible sources and thus towards the proliferation of data-driven methods in the biomedical domain.
DOI der Erstveröffentlichung: 10.1186/s13326-020-00223-z
URL der Erstveröffentlichung: https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-020-00223-z
Link zu diesem Datensatz: urn:nbn:de:bsz:291--ds-398321
hdl:20.500.11880/35874
http://dx.doi.org/10.22028/D291-39832
ISSN: 2041-1480
Datum des Eintrags: 23-Mai-2023
Fakultät: M - Medizinische Fakultät
Fachrichtung: M - Medizinische Biometrie, Epidemiologie und medizinische Informatik
Professur: M - Prof. Dr. Stefan Wagenpfeil
Sammlung:SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Dateien zu diesem Datensatz:
Datei Beschreibung GrößeFormat 
s13326-020-00223-z.pdf1,38 MBAdobe PDFÖffnen/Anzeigen


Diese Ressource wurde unter folgender Copyright-Bestimmung veröffentlicht: Lizenz von Creative Commons Creative Commons