open access publication

Article, 2024

Federated singular value decomposition for high-dimensional data

Data Mining and Knowledge Discovery, ISSN 1384-5810, Volume 38, 3, Pages 938-975, 10.1007/s10618-023-00983-z

Contributors

Hartebrodt A. 0000-0002-9172-3137 (Corresponding author) [1] [2] Rottger R. 0000-0003-4490-5947 [1] Blumenthal D.B. 0000-0001-8651-750X [2]

Affiliations

  1. [1] University of Southern Denmark
  2. [NORA names: SDU University of Southern Denmark; University; Denmark; Europe, EU; Nordic; OECD];
  3. [2] Friedrich-Alexander-Universität Erlangen-Nürnberg
  4. [NORA names: Germany; Europe, EU; OECD]

Abstract

Federated learning (FL) is emerging as a privacy-aware alternative to classical cloud-based machine learning. In FL, the sensitive data remains in data silos and only aggregated parameters are exchanged. Hospitals and research institutions which are not willing to share their data can join a federated study without breaching confidentiality. In addition to the extreme sensitivity of biomedical data, the high dimensionality poses a challenge in the context of federated genome-wide association studies (GWAS). In this article, we present a federated singular value decomposition algorithm, suitable for the privacy-related and computational requirements of GWAS. Notably, the algorithm has a transmission cost independent of the number of samples and is only weakly dependent on the number of features, because the singular vectors corresponding to the samples are never exchanged and the vectors associated with the features are only transmitted to an aggregator for a fixed number of iterations. Although motivated by GWAS, the algorithm is generically applicable for both horizontally and vertically partitioned data.

Keywords

Federated learning, Genome-wide association studies, Principal component analysis, Singular value decomposition

Funders

  • Horizon 2020 Framework Programme
  • Bundesministerium für Bildung und Forschung

Data Provider: Elsevier