Aggregating and analysing health data from diverse sources could deliver exciting new insights into diseases and lead to the development of new treatments. However, those responsible for databases of health data are understandably wary of sharing it due to security and privacy concerns.
But what if we could draw on these databases to generate synthetic data which can be shared without triggering privacy concerns? As the name suggests, synthetic data is data which has been created artificially to mimic real patient data. However, generating synthetic data that genuinely reflects a real population and cannot be traced back to real life individuals is not easy, and that’s where new IHI project SEARCH project comes in.
An ‘unparalleled opportunity’ to accelerate research and innovation
The aim of SEARCH is to develop an innovative biomedical data generation and sharing solution as well as generalisable methodologies for generating and validating synthetic data. The project will use new models to create realistic synthetic replicas of diverse types of healthcare data, including data types that are often missing from synthetic data sets, such as wearable device data, image sequences, and genomic data. The project will also deliver a framework for assessing the anonymity and credibility of synthetic data, with the long-term goal of facilitating the use of synthetic data in regulatory and health technology assessment (HTA) settings, for example.
“SEARCH offers an unparalleled opportunity to accelerate research and clinical innovation,” said project coordinator Aideen Long of the Trinity Translational Medicine Institute (TTMI). “By providing high-quality, FAIR [findable, accessible, interoperable, reuseable] synthetic datasets that mimic real-world healthcare data, we can empower researchers, clinicians, and industry to collaborate like never before. This opens the door for faster drug discovery, more personalised treatments, and the ability to create new, evidence-based healthcare policies - all without compromising patient privacy.”
‘A critical step towards enhancing patient care and outcomes’
The project will focus on three disease areas: gastrointestinal diseases (including cancer, chronic inflammatory and rare bowel diseases); cardiovascular diseases (including atrial fibrillation and stroke); and gynaecological diseases (namely cervical cancer and ovarian cancer). The synthetic datasets generated in these areas will be used to create tools designed to help in the diagnosis and care of these diseases.
“By advancing the creation of synthetic datasets and facilitating data sharing in compliance with stringent ethical and privacy standards, we are committed to pushing the boundaries of AI innovation in healthcare,” added SEARCH industry lead Chrysostomos Symvoulidis of MAGGIOLI S.P.A. “The development of solutions that improve diagnostic accuracy and decision support is a critical step towards enhancing patient care and outcomes."
Radically reshaping the future of health data sharing and research
The synthetic data, and the models that generated it, will be made available to the research community according to the FAIR principles. Furthermore, a hybrid approach using data clean rooms and federated learning will enable data to be analysed without requiring it to be shared and allow insights to be drawn across multiple datasets while keeping patient information securely stored at its source. Together, these innovations could radically reshape the future of health data sharing and research, driving innovation progress in healthcare tools without compromising privacy.
“SEARCH represents a paradigm shift in healthcare by enabling the generation and sharing of robust synthetic data across diverse healthcare use cases,” said Professor Dimitris Iakovidis of the University of Thessaly. “Our approach harnesses the power of federated learning and advanced SDG [synthetic data generation] methods to create synthetic datasets that replicate the statistical properties of real-world data, while ensuring patient privacy. This will empower healthcare providers and researchers with high-quality data to fuel next-generation AI and precision medicine tools.”