Perspective: application
Application components for data operations in layered, decentralised networks
This document focuses on the elaboration of a hybrid SPE based on the principles of data visiting. The proposed architecture has many similarities with what TEHDAS2 calls a federated SPE (federated SPE). The EHDS is essentially designed in a federated way: we ultimately want to be able to use health data from across Europe for secondary use. At the same time, we want health data to be copied as little as possible, because this compromises control over and security of the data. Being able to perform computations at the location of the data is therefore an important functional requirement. To make this possible, at minimum an architecture is needed for decentralised data processing between countries. This architecture provides for national nodes that can jointly perform analyses, under the coordination of a central node at European level. This approach is elaborated in TEHDAS2 M7.4 chapter 5 (SPE federation p. 42) and chapter 6 (Implementing federated computing p. 50). In this document we apply the same design principles to enable a network of data stations for decentralised information processing within a country.
Decentralised data processing is based on a network of data stations that are interconnected. The way in which these data stations are connected (the so-called network topology) is decisive for the architecture of the federated SPE. We broadly distinguish three network topologies1: centralised, decentralised and distributed.
Types of networks: a) centralised, b) decentralised, c) distributed.
Federated SPEs have two archetypes:2
- Data stations are connected to a single central server, in other words a federated SPE with a centralised network (also known as a hub-and-spoke network).
- Data stations are interconnected via a distributed network (also known as peer-to-peer).
Federated data processing with a) a central aggregation server, and b) a peer-to-peer network.
The most commonly used form of federated data processing is based on a central server that coordinates the data stations. The concept of a Federated Database System (FDBS) was described in 1985 and has been used for years to perform federated analysis (queries) across multiple databases.3 The concept of federated learning as introduced by Google in 20174 also uses a central server.
In the description of data stations, we therefore assume a central server on which the data user logs in to gain submit computing jobs to the network of data stations. Federated SPEs with a peer-to-peer network are explicitly out of scope for the architecture described here.
In addition, in the context of the EHDS it must be possible to work with federations of federations. The hybdrid SPE we have in mind has a layered structure of nodes. Think, for example, of a healthcare institution participating in a regional collaboration, with different regional nodes then forming part of a national federated network. On top of that, national nodes may form part of a European federation. In the elaboration of the architecture, we therefore assume a decentralised network that has a layered structure of multiple networks of SPEs (network type b in the illustration above).
The components of a decentralised network of SPEs
In the elaboration of the architecture for a decentralised network of SPEs, the data station and the processing hub are central. These two application components together realise the functionality needed in a decentralised SPE. In relation to the FAIR hourglass model, the data station is part of layer 3, while the processing hub is part of layer 4. Conceptually, we place the different forms of federated data processing in layer 5. Building on TEHDAS2, we distinguish between three (arche)types:
- Federated analysis: statistics are computed locally in a network of data stations. Only aggregated results or summary statistics are exported from the data stations, with corresponding safeguards to ensure that no personal data is extracted. Federated analysis is in principle the same as a Federated Database System. Federated analysis is particularly suitable for executing data requests in the sense of EHDS article 69. We regard KIK-V as a reference implementation for federated analysis.
- Federated learning: models are trained and validated on the data stations without sharing the raw data between the data stations. Instead, only the model updates are shared with the processing hub to achieve better data privacy and security. We regard PLUGIN as a reference implementation for federated learning.
- Data pooling: the data stations can be used to (temporarily) send data to another SPE or an authorised system, which is also known as data exchange. The EOSC-ENTRUST Blueprint provides a detailed architecture of how data stations can integrate with such Trusted Research Environments. The data pooling mechanism can also be used to deliver data to quality registries. Strictly speaking, data pooling is not a form of federated processing, but more of a hybrid SPE. Because there are so many overlaps and possible applications, it has been included in the scope of this document. It is also one of the reason why we prefer to speak of a hybrid archtiecture/SPE instead of a pure federated SPE.
In this chapter we describe the application components of a hybrid SPE, namely the three types of applications in layer 5, the Processing Hub and the data station. We also explicitly address the various TEHDAS2 requirements that have been formulated. For the other, more generic components, we follow the description in TEHDAS2 and conduct a shorter fit-gap analysis of the extent to which these components fit within a hybrid SPE. The table below provides an overview of the key application components within the FAIR hourglass five-layer model.
| Layer | Systems |
|---|---|
| 5 | > Federated analysis > Federated learning > Data pooling |
| 4 | > Data Access Application Management System > Health data catalogue > Processing hub |
| 3 | > Data station |
| 2 | > Data exposure system |
| 1 | > Source systems |
Overview of core components in the architecture of a federated SPE. The components that are central to this architecture are in bold.
-
P. Baran. On Distributed Communications Networks. IEEE Transactions on Communications Systems, 12(1):1–9, March 1964. Conference Name: IEEE Transactions on Communications Systems. URL: https://ieeexplore.ieee.org/abstract/document/1088883 (visited on 2024-08-12), doi:10.1109/TCOM.1964.1088883. ↩
-
Nicola Rieke, Jonny Hancox, Wenqi Li, Fausto Milletarì, Holger R. Roth, Shadi Albarqouni, Spyridon Bakas, Mathieu N. Galtier, Bennett A. Landman, Klaus Maier-Hein, Sébastien Ourselin, Micah Sheller, Ronald M. Summers, Andrew Trask, Daguang Xu, Maximilian Baust, and M. Jorge Cardoso. The future of digital health with federated learning. npj Digital Medicine, 3(1):1–7, September 2020. Number: 1. URL: https://www.nature.com/articles/s41746-020-00323-1 (visited on 2023-04-23), doi:10.1038/s41746-020-00323-1. ↩
-
Dennis Heimbigner and Dennis McLeod. A federated architecture for information management. ACM Transactions on Information Systems, 3(3):253–278, July 1985. URL: https://dl.acm.org/doi/10.1145/4229.4233 (visited on 2023-04-23), doi:10.1145/4229.4233. ↩
-
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 1273–1282. PMLR, April 2017. URL: https://proceedings.mlr.press/v54/mcmahan17a.html (visited on 2024-08-25). ↩

