Researchers Develop a New Tool for Improving Representation Learning of Histopathology Features with Low Requirements for Labeled Data

by Christos Evangelou, MSc, PhD – Medical Writer and Editor

In a recent study, researchers at Dartmouth College and Geisel School of Medicine at Dartmouth in Hanover, New Hampshire, USA, developed a new view generation method for representation learning. By employing joint embedding architectures, this new method can enhance representation learning for histology images.

The researchers used this new view generation method to analyze two histology image datasets for celiac disease and renal cell carcinoma (RCC). These validation studies demonstrated that their approach consistently improved various metrics of patch-level and slide-level classification performance.1

“Our findings indicate that representation learning approaches can perform on par or even better than fully supervised methods in some instances. This means that our proposed approach offers a potential path forward for developing models in computational pathology that can assist pathologists with diagnoses and prognoses when access to labeled data for training the model is limited,” said Saeed Hassanpour, PhD, Associate Professor at Geisel School of Medicine at Dartmouth and senior author of the study.

He added that this approach lowers the barrier to entry for developing models in computational pathology by reducing the amount of time and effort required by pathologists to label datasets.

The report was published in the Journal of Pathology Informatics.

Study Rationale: Developing a View Generation Method for Enhanced Histology Image Classification with Limited Labeled Data

Mounting evidence suggests that deep learning can improve pathology workflows for histology image analysis. However, standard computer vision models require large datasets of labeled images and regions of interest, which are difficult and time-consuming to obtain for histopathology images.

Additionally, annotated histopathology datasets often contain slide-level labels rather than patch-level labels, and relevant regions represent a small portion of a whole-slide image which does not suit the existing self-supervised methods.

“Representation learning approaches have been behind remarkable advances in the computer vision field, but these advancements have not transferred over to pathology yet. Prior works used representation learning methods directly from computer vision without making alterations to make it better suited for the unique aspects of pathology images,” Dr. Hassanpour noted.

To address these challenges, the authors developed HistoPerm, a flexible model-agnostic view generation method for representation learning to improve histology image classification by utilizing both labeled and unlabeled data.1

Specifically, the authors aimed to develop a technique that can take advantage of the abundant unlabeled histology image data along with limited labeled data to learn improved representations of histopathology features. Their goal was to show that this can lead to better downstream classification performance compared with current representation learning approaches that rely solely on labeled data.

Approach

The authors used joint embedding architectures to develop a view generation method for enhanced representation learning for histology images. They termed this method “HistoPerm.”

HistoPerm is a model-agnostic view generation technique that permutes augmented views of patches extracted from whole-slide histology images to improve representation learning and classification performance. This method utilizes the weakly labeled nature of histology slides, where the label applies to the whole slide, but only small regions are class-relevant.1

Dr. Hassanpour explained that joint embedding-based representation learning approaches operate on paired views, where each view represents a unique augmentation of a source image. The objective of this approach is to learn a model that maps each of the paired views to the same region in the latent space.

Unlike traditionally used natural images, histology images are typically weakly labeled — that is, they contain slide-level labels. In addition, histology slides may contain regions that are negative for whole-slide level labels (e.g., normal tissue in a resected tumor slide).

“Given these complexities, we view patches from a slide belonging to the same slide-level label as candidate views for mapping together in the latent space. We perform a permutation by slide-level label, so the paired views now only belong to the same slide-level label but not the same source image,” Dr. Hassanpour added.

Commenting on the novelty of this approach, he said, “Prior works have used either labeled or unlabeled data reducing the flexibility of their proposed approaches, and the usage of labeled and unlabeled data in a shared scheme is unique.”

Findings

The team studied the performance of HistoPerm on two histology image datasets for celiac disease and RCC. To this end, they employed three commonly used joint embedding architecture-based representation learning methods: BYOL, SimCLR, and VICReg.

These analyses showed that adding HistoPerm to existing representation learning frameworks such as BYOL, SimCLR, and VICReg improved the patch- and slide-level classification performance across a celiac disease and RCC dataset in almost all instances. Specifically, HistoPerm improved the patch-level classification accuracy in the celiac disease dataset by 8% for BYOL, 3% for SimCLR, and 8% for VICReg.1 At the slide level, HistoPerm improved classification accuracy in the celiac disease dataset by 6% for BYOL, 5% for SimCLR, and 2% for VICReg. In addition, the models integrating HistoPerm outperformed the fully supervised baseline model by 6%, 5%, and 2% for BYOL, SimCLR, and VICReg, respectively.

In the RCC dataset, the use of HistoPerm increased the patch-level classification accuracy by 2% for BYOL and VICReg and by 1% for SimCLR. In addition, the integration of HistoPerm reduced the classification accuracy gap for the models by up to 10% compared with the fully supervised baseline, despite using unlabeled data.

“This method outperforms a fully supervised method on the celiac disease dataset, indicating that HistoPerm can enable existing models to obtain performance previously unattainable to models using unlabeled data. This could reduce annotation requirements for pathologists,” noted Dr. Hassanpour when asked about the implications of these findings.

Future Work

Dr. Hassanpour noted that future studies are needed to determine the effect of mini-batch sizes on the performance of HistoPerm, as well as to examine the impact of incorporating unlabeled data from diverse data sources to explore the generalizability of HistoPerm across histology datasets with varied preparation and scanning procedures.

“In future work, we plan on exploring how we can reduce the labeled data requirements even further. This would enable HistoPerm to be used in more labeled data-constrained scenarios,” he added.

References

  1. DiPalma J, Torresani L, Hassanpour S. HistoPerm: A permutation-based view generation approach for improving histopathologic feature representation learning. J Pathol Inform. 2023;14(April):100320. doi:10.1016/j.jpi.2023.100320

Share This Post

Leave a Reply