by Christos Evangelou, MSc, PhD – Medical Writer and Editor
The use of machine learning algorithms to analyze whole slide images (WSIs) is set to transform various processes of clinical practice. More recently, machine learning-based computational frameworks have been developed to aid in the assessment of Alzheimer’s disease and other neurological conditions. However, advances in the use of machine learning frameworks to diagnose neurological diseases lag far behind the diagnostic applications of digital pathology in oncology.
In a recent proof-of-concept study, researchers at the University of California Davis analyzed the role of various preanalytical variables on the performance and generalizability of deep learning models in segmentation and object classification of WSIs from individuals with amyloid-β deposits.1
The study showed that scanner type and magnification were the preanalytical variables that most strongly affected the performance of deep learning algorithms in white matter/grey matter segmentation and amyloid-β plaque classification. These findings suggest that standardization of these preanalytical variables is needed to ensure the reproducibility and generalizability of deep learning algorithms for the assessment of Alzheimer’s disease.
“Despite the fact that our framework displayed similar results for most scanners, some WSIs needed Reinhard normalization to ensure that deep learning model performance could be replicated,” said Luca Cerny Oliveira, a graduate student researcher at the University of California Davis and the first author of the study.
He noted that “anyone seeking to apply deep learning models to WSI data in a clinical setting must pay attention to differences in WSI data.”
“These differences may require interventions, such as color normalization or unsupervised domain adaptation,” he added.
The report was published in the Journal of Neuropathology and Experimental Neurology.
Study Rationale: Examining the Effect of Preanalytical Variables on the Performance of Deep Learning Frameworks
In a previous study, the research team developed and tested a deep learning pipeline to automate the visualization and quantification of amyloid-β deposits. This framework was able to distinguish white matter from gray matter and detect different types of amyloid-β deposits on digitized immunohistochemically stained slides of brain tissues.
“Our previous work showed that deep learning-based automated segmentation and quantification frameworks excelled at quantifying amyloid-β plaques and segmenting white and gray matter in select neuroanatomic regions. However, it was tested on WSIs acquired from a single scanner,” said Luca.
Additionally, the deep learning pipeline was only trained and evaluated on slides stained for amyloid-β deposits, with most of the preanalytical variables kept unchanged.
Commenting on their motivation for this study, Zhengfeng Lai noted: “We wanted to further test the framework’s generalizability and decided to study how preanalytic variables may affect the performance of deep learning-based frameworks in more detail.” Zhengfeng is the study’s second author and the first to evaluate the two frameworks on a large number of WSIs.
To evaluate the effect of different preanalytical variables on the performance of deep learning frameworks in amyloid-β detection, the team used 60 WSIs from a cohort of 14 individuals with various extents of amyloid-β deposits.
The preanalytical variables that were examined in this study included imaging magnification (20´, 40´), scanner type (Leica Aperio AT2, Zeiss Axioscan Z1, and Leica Aperio GT 450), compression standard (JPEG 2000, JPEG XR), compression rate (45%, 75%), and storage format (SVS, CZI).
Effect of Preanalytical Variables on Amyloid-β Deposit Detection
The team found that repeated measurements of datasets created using different preanalytical variables yielded different results in terms of cored and diffuse plaque counts in the background.
Comparisons across datasets revealed that scanner type and magnification were the preanalytical variables with the most significant effect on the ability of the deep learning model to detect and count cored and diffuse plaque counts.
Effect of Preanalytical Variables on Segmentation Performance
To evaluate the effect of preanalytical variables on the segmentation performance of the deep learning framework, the researchers compared the white matter/grey matter ratios across WSI datasets generated using different settings and slide scanners.
Consistent with their findings on amyloid-β detection, the results of this comparison showed that magnification and scanner type were the preanalytical variables with the strongest effect on the performance of white matter/grey matter segmentation.
Commenting on the significance of these findings, Luca said that, despite the overall reliability and robustness to diversity in preanalytical variables, “it is not possible to guarantee a deep learning model’s performance will be replicated if it is applied to data that may be different from the original dataset, or in this case, different scanners.”
“Our findings suggest that testing the generalizability of deep learning models using other data sources, such as those generated using different scanners, is needed. Such tests will ensure that deep learning models retain their performance when presented with diverse image data,” he added.
Although this pilot study showed that various preanalytical variables could significantly affect the performance of deep learning algorithms for amyloid-β detection in digitized brain slides, the cohort size was relatively small. Hence, confirmation of these findings in a larger cohort is warranted. Additional future work will include the evaluation of the effects of preanalytical variables on different tasks and across different tissue types and stains.
“We could not establish the best way to address the performance difference we observed in certain scanners. Although we tried color normalization, it is not a generalizable technique, as different scanners display different color space changes. We aim to explore unsupervised domain adaptation in the future, as it is a generalizable technique,” Luca said.
Commenting on how their findings fit the bigger picture, Luca stated that the comparison of multiple deep learning-based frameworks under different pre-analytical variables is an underexplored topic. “Different sites may have different equipment and protocols for slide scanning. Organizations like the Digital Pathology Association have put forth some great resources for pathology research; however, a standard setting for scanners used in machine learning has yet to be incorporated.”
“We hope that the results of our study will increase awareness of how deep learning frameworks may behave differently when presented with data from different scanners,” he concluded.
- Oliveira LC, Lai Z, Harvey D, et al. Preanalytic variable effects on segmentation and quantification machine learning algorithms for amyloid-β analyses on digitized human brain slides. J Neuropathol Exp Neurol. 2023;82(3):212-220. doi:10.1093/jnen/nlac132