Reshaping the Future of Medical Care, Education and Research: The Pivotal Roles of Synthetic Data, Generative AI, and Auto-MLs

This is part five of our report on the Digital Pathology & AI Congress: USA. For other parts, please see links toward the bottom of the article. The conference featured insightful discussions on the practical aspects and potential impact of machine learning applications in pathology, showcasing the potential of AI to enhance diagnostic accuracy and efficiency, improve patient care, and foster personalized medicine. Parts five through eight will cover these discussions.


Hooman Rashidi of the University of Pittsburgh School of Medicine gave a keynote talk on the role of synthetic data, generative AI, and automated machine learning (auto-ML) in reshaping the future of medical care, education, and research. Dr. Rashidi highlighted the significant obstacles in accessing medical data owing to regulatory restrictions and discussed how advancements in synthetic data creation, generative AI, and auto-ML could help overcome these challenges. He emphasized the potential of these technologies to expedite data access, lower entry barriers in healthcare research and innovation, and enhance our understanding of complex biological processes.[1]

Regulations around storing and accessing medical data, such as the Health Insurance Portability and Accountability Act (HIPAA) in the US and the General Data Protection Regulation (GDPR) in the EU, protect patient privacy concerns and help prevent the abuse of sensitive data. However, these regulatory restrictions have hindered research and innovation. Recent advances in generative AI and machine learning have emerged as promising approaches to overcome some obstacles associated with accessing and sharing medical data. Synthetic data generation techniques can produce new artificially generated data that mimic the properties of real data and can be used to train and test machine learning models. Generative AI models, such as ChatGTP and DALL-E, can learn patterns from existing data and generate new text, images, or videos. It is also possible to create tabular data that are partially or fully synthetic. Synthetic data generation techniques share several similarities, such as the use of deep learning to generate new data and the capabilities of learning complex high-dimensional data distributions and uncovering hidden patterns and relationships in the data. However, different types of synthetic data are distinct entities with marked differences, particularly in medical applications.

Recent advances in deep neural networks, including autoencoders, generative adversarial networks (GANs), diffusion probabilistic models, and generative transformers, have revolutionized our approach to synthetic data generation. Large language models (LLMs) can generate synthetic text, whereas GANs and diffusion-based models can generate realistic synthetic medical images. Nevertheless, widely available LLM chatbots suffer from hallucinations and are inappropriate for use in healthcare. Transfer learning approaches can be used to generate custom LLM chatbots optimized for different medical disciplines. For example, Dr. Rashidi demonstrated a custom hematology chatbot that could accurately answer questions related to the diagnosis and classification of hematological questions. At the end of the answer, the chatbot listed the references used to produce the answer, all of which were provided to the algorithm for training. To overcome data security issues of custom LLM chatbots, Dr. Rashidi’s team developed Pitt-GTP, a custom LLM model that runs locally.

Pathologists not only deal with images but also work with large tabular datasets. Dr. Rashidi explained that cleaning tabular data is tedious and time consuming. Dr. Rashidi demonstrated MILO, an auto-ML preprocessing tool that can help pathologists address real tabular datasets that contain missing values in minutes instead of days or weeks.

[1] Hooman Rashidi. Reshaping the future of medical care, education and research: The pivotal roles of synthetic data, generative AI, and auto-MLs. Presented at the 10th Digital Pathology & AI Congress: USA; May 7-8, 2024; San Diego, CA.

Dr. Rashidi, Professor and Associate Dean of AI at the University of Pittsburgh School of Medicine and Executive Director of Computational Pathology & AI Center of Excellence at the University of Pittsburgh Medical Center.

Links To Other Parts Of The Series

Part 1: Highlights from the 10th Digital Pathology & AI Congress: USA

Part 2: Digital Pathology Implementation: Insights From Experts At DP&AI: USA

Part 3: Clinical Implementation Challenges And Potential Solutions

Part 4: Recent Advances In Digital Pathology


[1] Hooman Rashidi. Reshaping the future of medical care, education and research: The pivotal roles of synthetic data, generative AI, and auto-MLs. Presented at the 10th Digital Pathology & AI Congress: USA; May 7-8, 2024; San Diego, CA.

Share This Post

Leave a Reply