gms | German Medical Science

66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

26. - 30.09.2021, online

Multivariate regression modelling with global and cohort-specific effects in a federated setting with data protection constraints

Meeting Abstract

Search Medline for

  • Max Behrens - Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg im Breisgau, Germany
  • Daniela Zöller - Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg im Breisgau, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 26.-30.09.2021. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 172

doi: 10.3205/21gmds090, urn:nbn:de:0183-21gmds0905

Published: September 24, 2021

© 2021 Behrens et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Multi-cohort studies are an important tool to study effects on a large sample size and to identify cohort-specific effects. Thus, researchers would like to share information between cohorts and research institutes. However, data protection constraints sometimes forbid the exchange of individual-level data between different research institutes. To circumvent this problem, only non-disclosive aggregated data is exchanged, which is often done manually and requires explicit permission before transfer. The framework DataSHIELD enables automatic exchange in iterative calls and thus facilitates the use of methods for performing more complex tasks such as federated optimisation.

We propose a federated method for multivariate regression models aiming to improve the model for a specific cohort of interest by including the information form other cohorts even in the presence of cohort-specific effects. This approach is solely based on non-disclosive aggregated data from different institutions and should be applicable in a setting with high-dimensional data with complex correlation structures. Nonetheless, the amount of transferred data is limited to enable manual confirmation of data protection compliance.

Our approach implements an iterative procedure between the cohort-specific model and a global model using data from other cohorts in addition to the ones from the cohort of interest. Herein, the linear predictor of the global model will act as a covariate in the cohort-specific model estimation. Subsequently, the linear predictor of the updated cohort-specific model is included in the global model estimation. The procedure is repeated until the combined model converges with respect to the cohort-specific model estimates.

In different simulation settings, we aim to show that our approach improves cohort-specific predictions by reducing overfitting and preserving the globally found effect structure. In a more complex simulation setting, we test our approach under more realistic conditions which allow further generalization of the results. Herein, three different roles of cohort-specific effects are studied – namely no cohort-specific effect, an independent cohort-specific effect and a confounding cohort-specific effect. As a consequence, the method can be evaluated for different assumptions regarding the underlying effect structure.

In general, all gradient-based methods can be adapted easily to a federated setting under data protection constraints. The here presented method can be used in this setting to obtain better predictions and can thus aid in the process of understanding cohort-specific estimates.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.