Deep learning algorithms have been widely adopted to automate the analysis of whole slide images (WSIs) and streamline digital pathology workflows. Analysis of WSIs using supervised deep learning typically involves the fragmentation of digital WSIs into multiple small patches, which are used to train the algorithm. Once deep learning algorithms are trained and their performance is validated, they can be used to classify WSIs according to different disease states (e.g., cancer versus non-cancer). A key limitation of patch-based deep learning is that it does not consider WSI-level tissue features that may have clinical relevance. In addition, patch-based learning can introduce label noise.
In a recent study, researchers from the Boston University School of Medicine developed a graph-based vision transformer to predict disease grades by fusing graph-based representations of WSIs. By combining the computational efficiency of transformer architectures and clinical information from graph representations of pathology WSIs, the graph transformer can improve the accuracy of WSI classification.1
“Rather than using machine learning frameworks from the field of traditional computer vision that work on natural images, we have designed a neural network to efficiently analyze a pathology image and learn overall disease information,” said Vijaya Kolachalama, PhD, Assistant Professor of Medicine at Boston University and corresponding author of the study.
Talking about the implications of their findings, Dr Kolachalama said: “This is an important advancement in the field of digital pathology and hopefully will motivate researchers to design methods based on the problem instead of borrowing them from different fields and applying them in digital pathology.”
“Our method enables us to not only predict the predominant disease label on the WSI but also precisely identify regions that are indicative of the disease. When fully validated and deployed in a clinical setting, such methods can assist pathology practice,” he added.
The study has been accepted for publication in the journal IEEE Transactions on Medical Imaging.
Study Rationale
“The field of digital pathology is undergoing a revolution due to advances in computer vision. The images generated via digitization of the pathology slides are high in magnification and large; the size is gigabytes or more,” said Dr Kolachalama. “Advanced methods, such as deep learning, have been used to analyze these large images. Most of these methods were borrowed from traditional computer science where the common forms of imaging data are natural images or pictures of objects and animals,” he added.
Dr Kolachalama noted that traditional WSI analysis methods using deep learning involve dividing WSIs into tiles, independently analyzing each tile, and aggregating the outcomes from all the tiles to extract WSI-level information. He explained that this process is unsuitable for all pathology images, especially when local tissue characteristics and entire WSI-level information may have clinical value. “The lack of methods to combine local and overall tissue-level information limits the clinical adoption of digital pathology,” Dr Kolachalama said.
Characteristics of the Graph Transformer
“We developed a novel deep learning algorithm to improve the accuracy of existing methods in predicting tumor grade in non-small cell lung cancer. Our approach is based on representation learning, wherein a WSI is first represented as a graph,” Dr Kolachalama noted. He explained that each tile of the WSI serves as a node, with edges from this node connecting to its neighboring tiles. “We then fused the graph with a vision transformer to capture long-range interactions. This graph-based vision transformer was used to predict output labels of interest in WSIs.”
The Graph Transformer Accurately Classifies WSIs Into Cancer and Non-Cancer
To develop a framework for WSI analysis, the team used 4818 WSIs and clinical data from three publicly available datasets: The Cancer Genome Atlas (TCGA), Clinical Proteomic Tumor Analysis Consortium (CPTAC), and National Lung Screening Trial (NLST).1 NLST data were used to develop a feature extractor using a contrastive learning model. The feature extractor was used to compute feature vectors of individual WSI patches, which were integrated into a graph representation and a graph transformer.
Subsequently, the team used the CPTAC data (training dataset) to train the model. The model consistently showed high performance in differentiating lung adenocarcinoma and squamous cell carcinoma from adjacent non-cancerous (normal) tissues. In the CPTAC cohort, the graph transformer provided an accuracy of 91.2% (±2.5%) and an area under the curve (AUC) value of 97.7 (±0.9), outperforming current state-of-the-art methods (TransMIL and AttPool). When applied to the validation dataset (TCGA), the model provided 82.3% (±1.0%) accuracy and 92.8 (±0.3) AUC.1
WSI-level Saliency Maps Identify Disease-Related Regions of Interest
Going one step further, the team developed a new graph-based class activation mapping (GraphCAM) method to produce WSI-level saliency maps of tissue areas associated with the class label. Comparison of the performance of GraphCAM with manual WSI review by a pathologist showed a significant overlap between GraphCAM and pathologist-based annotations,1 suggesting that saliency maps can identify image regions with clinicopathological value.
Factors Affecting Graph Transformer’s Performance in WSI-Level Classification
To identify the features affecting the ability of the graph transformer to accurately classify WSIs, the team performed multiple ablation studies wherein they replaced different features of the transformer. When contrastive learning was used, the graph transformer outperformed other WSI-level classification methods. However, replacing contrastive learning with a pre-trained architecture or a novel convolutional autoencoder resulted in a decrease in the model’s accuracy from 91.4% (±1.1%) to 80.8% (±1.1%) and 80.2% (±1.6%), respectively.1
The team also assessed the effect of different graph configurations and model hyperparameters on the classification performance of the graph transformer.1 They found that reducing the number of graph convolutional network layers from 3 to 1, increasing the patch size, or reducing the dimension of the hidden state from 128 to 64 led to lower classification performance. In contrast, increasing the number of min-cut nodes improved the model performance.1
Bigger Picture
Commenting on how their findings fit the bigger picture, Dr Kolachalama said: “The future of AI and computer vision for digital pathology has a long way to go, at least in terms of developing novel approaches that work efficiently for the problems that are relevant to us. Representation learning is a promising field, and the idea of using graphs to approximate WSIs is interesting. We plan to further explore this framework to address more questions in digital pathology.”
References
- Zheng Y, Gindra RH, Green EJ, et al. A graph-transformer for whole slide image classification. IEEE Trans Med Imaging. 2022 May 20;PP. doi: 10.1109/TMI.2022.3176598. Online ahead of print. PMID: 35594209