HEROHE Challenge Brings Us One Step Closer to Predicting HER2 Status in Breast Cancer from H&E Whole Slide Images

by Christos Evangelou, MSc, PhD – Medical Writer and Editor.

Breast cancer is the most common cancer in women, causing more than half a million deaths every year worldwide. The expression levels of biomarkers can help clinicians to determine the appropriate treatment method for each patient.

Immunohistochemistry (IHC) alone or in combination with in situ hybridization (ISH) is the standard method used in the clinic to determine the expression of breast cancer-specific biomarkers, including human epidermal growth factor receptor 2 (HER2). However, evaluating the levels of biomarkers in biopsies using immunohistochemistry is tissue-consuming and prone to analytical or interpretation errors.

Emerging technologies based on artificial intelligence (AI) and machine learning may provide novel methods for predicting the expression of biomarkers in hematoxylin and eosin (H&E)-stained breast cancer tissues, eliminating the need for labor-intensive immunohistochemical staining assays.

In a recent article, pathology experts reported the key findings from the HER2 on H&E (HEROHE) Challenge, which was held as a parallel event of the 16th European Congress on Digital Pathology.1

“We show that the status of HER2 can be predicted by the analysis of the morphology of the tumor with a common H&E staining,” said António Polónia, MD PhD, surgical pathologist at Ipatimup Diagnostics and co-chair of the IT (Computational) Working Group of the European Society Of Pathology.

“In contrast to IHC, ISH, and other molecular techniques that are only available in some labs, H&E staining is cheap, fast, and feasible in every lab in the world,” he added.

Although the feasibility of predicting HER2 status in H&E-stained tissues has been reported in previous studies, Dr Polónia noted that some of the models developed by the teams that participated in the challenge achieved better results than those reported in the literature. “The performance of some of the models discussed in the HEROHE Challenge is so high that renders these methods suitable for use in a clinical setting,” Dr Polónia added.

The HEROHE Challenge was the first challenge developed to predict HER2 status from H&E-stained whole slide images (WSIs). The findings of the challenge were published in the Journal of Imaging.


The HEROHE Challenge: Predicting HER2 Status in Breast Cancer from H&E WSIs

“IHC and ISH are tissue-consuming, they require a few days to run, and have inherent costs,” Dr Polónia noted. “Recently, there has been a lot of interest in the field of AI in pathology. Given that pathology has a strong component of image analysis, traditionally performed by the pathologist at the optical microscope, the recent implementation of digital pathology in many labs provides the basis for the application of AI for the analysis of pathology images,” he added.

The HEROHE Challenge aimed to generate new tools for predicting HER2 status in H&E slides of biopsy samples from patients with invasive breast cancer. A total of 21 teams from across the world participated in the HEROHE Challenge.1

The teams had access to a large dataset of 509 annotated WSIs, which were specifically collected for the challenge. The corresponding IHC and ISH images were used for ground truth classification of HER2 status, but no IHC or ISH slides were provided.1

Dr Polónia explained that because breast cancer diagnosis begins with the evaluation of H&E-stained tissues, H&E slides are available for all patients with suspected breast cancer. “The rationale is to use the microscopic morphology of the tissue on H&E slides to predict the expression of biomarkers and, in turn, decide the best treatment for each patient. Instead of a qualitative or semi-quantitative evaluation manually performed by expert pathologists, we now move to a more quantitative evaluation that could be more objective and precise.”

The performance of the models developed by the teams was compared using different evaluation metrics, and the network architectures and key parameters of the best-performing models were presented.


Performance of HER2 Status Prediction Models

The performance of the prediction models was evaluated using the F1 score, a classification metric of accuracy that takes into account precision and recall. Other evaluation metrics included receiver operating characteristic (ROC) curves and the area under the curve (AUC).1

The models of six of the 21 teams exhibited high performance in predicting HER2 status from H&E WSIs, with some of the models providing AUC scores higher than 0.8. The F1 scores ranged from 0.29 (low performance) to 0.68 (high performance), while AUC scores ranged from 0.44 to 0.84. Precision and recall ranged from 0.31 to 0.77 and from 0.27 to 1.00, respectively.1

In the analysis, WSIs were classified as positive if the prediction was greater than or equal to the threshold of each model; WSIs with predictions lower than the threshold were classified as negative. The choice of the positivity threshold was found to be one of the factors with the greatest influence on the F1 score. Therefore, the optimal threshold for the test dataset was identified for each model to maximize the prediction performance. This threshold adjustment improved the performance of three of the top four models.1

AUC scores for HER2 prediction were used to compare the performance of the submitted models with those reported in the literature. The models of two of the teams achieved AUC scores similar to those reported in previous studies, and the model of one team yielded an AUC score that was higher than those reported in the literature (AUC 0.84).1

Validation of the submitted models in the subset of equivocal IHC cases confirmed the ability of some of the models to predict HER2 status with high accuracy, as evidenced by the high F1 and AUC scores.



Dr Polónia noted that although some of the models achieved HER2 status prediction accuracies higher than those previously reported in the literature, interpretability and explainability are limitations that remain to be addressed.

“The methods are too complex for us to understand how they work—even for the researchers that developed them. This brings important questions of trust and usability in the clinical setting because clinicians do not want to use a technology they do not understand and, therefore, cannot verify their accuracy.”

He noted, however, that the limited interpretability and explainability of current HER2 prediction methods does not mean that these models cannot be used in settings other than diagnosis, such as quality control or reflex testing.



Commenting on the bigger picture and future of diagnostic AI algorithms, Dr Polónia stated: “Many labs still struggle to perform molecular techniques, such as IHC. Although H&E is also prone to interlaboratory variability, deep-learning methods can theoretically overcome this limitation and provide a result that is not only correct but is not tissue-consuming, does not require several days, and could be cheaper than standard methods.”

The HEROHE dataset is publicly available on the HEROHE ECDP2020 website.




1. Conde-Sousa E, Vale J, Feng M, et al. HEROHE Challenge: Predicting HER2 Status in Breast Cancer from Hematoxylin-Eosin Whole-Slide Imaging. J imaging. 2022;8(8). doi:10.3390/jimaging808021

Share This Post

Leave a Reply