Skip to content

Data stations as a foundational building block for secondary health data sharing

What's in a name: data visiting, compute to the data, personal health train, federated analytics and PETs

The ambition for a seamlessly connected digital healthcare ecosystem, capable of leveraging vast quantities of patient data for research and innovation remains illusive. Designing and implementing health data platforms is notoriously difficult, given the heterogeneity and complexity of such systems. Inspired by previous calls to action to move towards open architectures for health data systems 12, convergence of health information standards3 and the notion of the hourglass model145, we hypothesize that the concept of a 'data station' can be used as a foundational building block for national-wide health data sharing ecosystems.

A data station is envisaged as the site where data visiting can take place in a secure and privacy-enhancing manner by enabling the use of the data where they are.678 Data visiting is also known as compute to the data or the personal health train (PHT).910111213 The concept of federated analytics14 encompasses data visiting, within which federated learning (FL) is a specialization of this concept where machine learning models are trained collaboratively through sharing of model parameters.151617 All these techniques and methods are also often refered to as privacy-enhancing technologies (PETs), which are now sufficiently mature to be used on an industrial scale, enabling computations to be done under encryption (in-the-blind) thereby significantly improving security across a network of participants.1819

The sine qua non of this plethora of distributed technologies is the existence of a data station.

Why a specification for data stations now?

Recent technological advances in the data engineering community offer important new enablers to implement data stations. The composable data stack as a solution design allows for unbundling of hitherto monolithic data platforms into loosely coupled components.2021 Key components in this stack, most notably DuckDB and polars, have significantly increased the single-node computing capabilities, whereby it is now possible to process up to 1 TB of tabular data on a single machine node, that is, on a single data station.2223

This specification of data stations in not only motivated by technological enablers, but also by other developments:

  • data stations extends the notion of the FAIR Hourglass which aims to establish the use of widely agreed-upon open, minimal standards for machine-actionable data sharing.245
  • data stations are essential in implementing contemporary data governance frameworks, including the Data Governance Act (DGA), the European Health Data Space (EHDS) and the concept of data solidarity.25
  • data stations can also contribute in the shift towards a more equitable, open digital infrastructure.26

FAIR Hourglass


  1. Deborah Estrin and Ida Sim. Health care delivery. Open mHealth architecture: an engine for health care innovation. Science (New York, N.Y.), 330(6005):759–760, November 2010. doi:10.1126/science.1196187

  2. Garrett L Mehl, Martin G Seneviratne, Matt L Berg, Suhel Bidani, Rebecca L Distler, Marelize Gorgens, Karin E Kallander, Alain B Labrique, Mark S Landry, Carl Leitner, Peter B Lubell-Doughtie, Alvin D Marcelo, Yossi Matias, Jennifer Nelson, Von Nguyen, Jean Philbert Nsengimana, Maeghan Orton, Daniel R Otzoy Garcia, Daniel R Oyaole, Natschja Ratanaprayul, Susann Roth, Merrick P Schaefer, Dykki Settle, Jing Tang, Barakissa Tien-Wahser, Steven Wanyee, and Fred Hersch. A full-STAC remedy for global digital health transformation: open standards, technologies, architectures and content. Oxford Open Digital Health, 1:oqad018, January 2023. URL: https://doi.org/10.1093/oodh/oqad018 (visited on 2024-02-09), doi:10.1093/oodh/oqad018

  3. Guy Tsafnat, Rachel Dunscombe, Davera Gabriel, Grahame Grieve, and Christian Reich. Converge or Collide? Making Sense of a Plethora of Open Data Standards in Health Care. Journal of Medical Internet Research, 26(1):e55779, April 2024. Company: Journal of Medical Internet Research Distributor: Journal of Medical Internet Research Institution: Journal of Medical Internet Research Label: Journal of Medical Internet Research Publisher: JMIR Publications Inc., Toronto, Canada. URL: https://www.jmir.org/2024/1/e55779 (visited on 2024-09-21), doi:10.2196/55779

  4. Micah Beck. On the hourglass model. Communications of the ACM, 62(7):48–57, June 2019. doi:10.1145/3274770

  5. Erik Schultes. The FAIR hourglass: A framework for FAIR implementation. FAIR Connect, 1(1):13–17, January 2023. URL: https://www.medra.org/servlet/aliasResolver?alias=iospress&doi=10.3233/FC-221514 (visited on 2023-04-04), doi:10.3233/FC-221514

  6. Martin Weise, Filip Kovacevic, Nikolas Popper, and Andreas Rauber. OSSDIP: Open Source Secure Data Infrastructure and Processes Supporting Data Visiting. Data Science Journal, February 2022. URL: https://datascience.codata.org/articles/dsj-2022-004 (visited on 2025-09-29), doi:10.5334/dsj-2022-004

  7. Juan González-García, Javier González-Galindo, Francisco Estupiñán-Romero, Martin Thißen, Ronan A Lyons, Carlos Telleria-Orriols, Enrique Bernal-Delgado, and Population Health Information Research Infrastructure. PHIRI: lessons for an extensive reuse of sensitive data in federated health research. European Journal of Public Health, 34(Supplement_1):i43–i49, July 2024. URL: https://doi.org/10.1093/eurpub/ckae036 (visited on 2025-09-29), doi:10.1093/eurpub/ckae036

  8. Manlio Bacco, Margherita Di Leo, Albana Kona, Mattia Santoro, and Paolo Mazzetti. Federated Learning for Data Spaces: a Privacy-Enhancing Strategy Based on Data Visiting. In 2024 IEEE International Workshop on Metrology for Agriculture and Forestry (MetroAgriFor), 592–597. October 2024. URL: https://ieeexplore.ieee.org/abstract/document/10948754 (visited on 2025-09-29), doi:10.1109/MetroAgriFor63043.2024.10948754

  9. Oya Beyan, Ananya Choudhury, Johan van Soest, Oliver Kohlbacher, Lukas Zimmermann, Holger Stenzhorn, Md. Rezaul Karim, Michel Dumontier, Stefan Decker, Luiz Olavo Bonino da Silva Santos, and Andre Dekker. Distributed Analytics on Sensitive Medical Data: The Personal Health Train. Data Intelligence, 2(1-2):96–107, January 2020. URL: https://doi.org/10.1162/dint_a_00032 (visited on 2023-02-16), doi:10.1162/dint_a_00032

  10. Ananya Choudhury, Johan van Soest, Stuti Nayak, and Andre Dekker. Personal Health Train on FHIR: A Privacy Preserving Federated Approach for Analyzing FAIR Data in Healthcare. In Arup Bhattacharjee, Samir Kr. Borgohain, Badal Soni, Gyanendra Verma, and Xiao-Zhi Gao, editors, Machine Learning, Image Processing, Network Security and Data Sciences, Communications in Computer and Information Science, 85–95. Singapore, 2020. Springer. doi:10.1007/978-981-15-6315-7_7

  11. Luiz Olavo Bonino da Silva Santos, Luis Ferreira Pires, Virginia Martinez, João Moreira, and Renata Guizzardi. Personal Health Train Architecture with Dynamic Cloud Staging. SN Computer Science, October 2022. doi:10.1007/s42979-022-01422-4

  12. Chong Zhang, Ananya Choudhury, Leroy Volmer, Johan Soest, Inigo Bermejo, Andre Dekker, Aiara Lobo Gomes, and Leonard Wee. Secure and Private Healthcare Analytics: A Feasibility Study of Federated Deep Learning with Personal Health Train. July 2023. ISSN: 2693-5015. URL: https://www.researchsquare.com/article/rs-3158418/v1 (visited on 2024-10-24), doi:10.21203/rs.3.rs-3158418/v1

  13. Ananya Choudhury, Leroy Volmer, Frank Martin, Rianne Fijten, Leonard Wee, Andre Dekker, and Johan van Soest. Advancing Privacy-Preserving Health Care Analytics and Implementation of the Personal Health Train: Federated Deep Learning Study. JMIR AI, 4(1):e60847, February 2025. Company: JMIR AI Distributor: JMIR AI Institution: JMIR AI Label: JMIR AI Publisher: JMIR Publications Inc., Toronto, Canada. URL: https://ai.jmir.org/2025/1/e60847 (visited on 2025-06-26), doi:10.2196/60847

  14. Ahmed Roushdy Elkordy, Yahya H. Ezzeldin, Shanshan Han, Shantanu Sharma, Chaoyang He, Sharad Mehrotra, and Salman Avestimehr. Federated Analytics: A Survey. APSIPA Transactions on Signal and Information Processing, 2023. URL: http://www.nowpublishers.com/article/Details/SIP-2022-0063 (visited on 2023-05-07), doi:10.1561/116.00000063

  15. Zibo Wang, Haichao Ji, Yifei Zhu, Dan Wang, and Zhu Han. A Survey on Federated Analytics: Taxonomy, Enabling Techniques, Applications and Open Issues. IEEE Communications Surveys & Tutorials, pages 1–1, 2025. URL: https://ieeexplore.ieee.org/document/10960683/ (visited on 2025-06-19), doi:10.1109/COMST.2025.3558755

  16. Nicola Rieke, Jonny Hancox, Wenqi Li, Fausto Milletarì, Holger R. Roth, Shadi Albarqouni, Spyridon Bakas, Mathieu N. Galtier, Bennett A. Landman, Klaus Maier-Hein, Sébastien Ourselin, Micah Sheller, Ronald M. Summers, Andrew Trask, Daguang Xu, Maximilian Baust, and M. Jorge Cardoso. The future of digital health with federated learning. npj Digital Medicine, 3(1):1–7, September 2020. Number: 1 Publisher: Nature Publishing Group. URL: https://www.nature.com/articles/s41746-020-00323-1 (visited on 2023-04-23), doi:10.1038/s41746-020-00323-1

  17. Zhen Ling Teo, Liyuan Jin, Nan Liu, Siqi Li, Di Miao, Xiaoman Zhang, Wei Yan Ng, Ting Fang Tan, Deborah Meixuan Lee, Kai Jie Chua, John Heng, Yong Liu, Rick Siow Mong Goh, and Daniel Shu Wei Ting. Federated machine learning in healthcare: A systematic review on clinical applications and technical architecture. Cell Reports Medicine, 5(2):101419, February 2024. URL: https://www.sciencedirect.com/science/article/pii/S2666379124000429 (visited on 2025-09-17), doi:10.1016/j.xcrm.2024.101419

  18. United Nations Committee of Experts on Big Data and Data Science for Official Statistics. The PET Guide. Technical Report, United Nations, 2023. URL: https://unstats.un.org/bigdata/task-teams/privacy/guide/ (visited on 2025-01-22). 

  19. The Royal Society. From privacy to partnership. Technical Report, The Royal Society, January 2023. 

  20. Pedro Pedreira, Orri Erling, Konstantinos Karanasos, Scott Schneider, Wes McKinney, Satya R Valluri, Mohamed Zait, and Jacques Nadeau. The Composable Data Management System Manifesto. Proceedings of the VLDB Endowment, 16(10):2679–2685, June 2023. URL: https://dl.acm.org/doi/10.14778/3603581.3603604 (visited on 2023-12-27), doi:10.14778/3603581.3603604

  21. The Composable Codex. URL: https://voltrondata.com/codex.html (visited on 2024-10-16). 

  22. Mark Raasveldt and Hannes Mühleisen. DuckDB: an Embeddable Analytical Database. In Proceedings of the 2019 International Conference on Management of Data, 1981–1984. Amsterdam Netherlands, June 2019. ACM. URL: https://dl.acm.org/doi/10.1145/3299869.3320212 (visited on 2025-06-09), doi:10.1145/3299869.3320212

  23. Felix Nahrstedt, Mehdi Karmouche, Karolina Bargieł, Pouyeh Banijamali, Apoorva Nalini Pradeep Kumar, and Ivano Malavolta. An Empirical Study on the Energy Usage and Performance of Pandas and Polars Data Analysis Python Libraries. In Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering, 58–68. Salerno Italy, June 2024. ACM. URL: https://dl.acm.org/doi/10.1145/3661167.3661203 (visited on 2025-06-09), doi:10.1145/3661167.3661203

  24. Luiz Olavo Bonino da Silva Santos, Giancarlo Guizzardi, and Tiago Prince Sales. FAIR Digital Object Framework. October 2022. URL: https://fairdigitalobjectframework.org/ (visited on 2025-06-19). 

  25. Barbara Prainsack and Seliem El-Sayed. Beyond Individual Rights: How Data Solidarity Gives People Meaningful Control over Data. The American Journal of Bioethics, 23(11):36–39, November 2023. URL: https://www.tandfonline.com/doi/full/10.1080/15265161.2023.2256267 (visited on 2023-11-11), doi:10.1080/15265161.2023.2256267

  26. Jan Krewer and Zuzanna Warso. Digital Commons as Providers of Public Digital Infrastructures. Technical Report, Open Future Foundation, November 2024. URL: https://openfuture.eu/publication/digital-commons-as-providers-of-public-digital-infrastructures (visited on 2025-06-15). 

Comments