Using deep learning to predict biomarker status of breast tumors

Deep Learning and AI in Pathology and Medicine

Artificial intelligence (AI) and machine learning entail the use of machines and algorithms to carry out complicated tasks and solve complex problems. Deep learning is a machine learning method that involves the generation and implementation of artificial neural networks with three or more layers.

By simulating human intelligence, these neural networks can “learn” from large amounts of data. As “trained” algorithms can make predictions and make data-driven decisions, they have immense potentials in various medical fields, including pathology and cancer diagnosis from tissue biopsies.

The need for AI in breast cancer diagnosis and management

Determining the expression status of hormone receptors — estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) — in breast tumors has become a standard process in breast cancer diagnosis.

Based on the expression status of these receptors, breast tumors are classified into different subtypes. Accurate diagnosis of these clinical subtypes is imperative in cancer management, as tumors of different subtypes respond differently to anticancer treatments.1–3

However, current biomarker scoring systems based on immunohistochemistry (IHC) involve multiple time-consuming, laborious, and technically challenging tissue preparation steps, making the interpretation of staining data challenging and hard to standardize.4 In turn, misdiagnosis of breast cancer subtypes or delays in diagnosis may have a tremendous impact on treatment outcomes.

A deep learning method to determine hormone receptor status and morphological features in breast tumors

In a recent study appearing in Communications Medicine, researchers from Google Health developed three independent deep learning models to predict the status of ER, PR, and HER2 in focal tissue regions (patches) and slides of breast cancer tissues. These two-stage deep learning models were trained using paired, pathologist-annotated digital images of hematoxylin and eosin (H&E)-stained and IHC-stained tissues from three sources: a medical laboratory, a tertiary hospital, and The Cancer Genome Atlas (TCGA).5

After training the algorithms, researchers applied the models on test samples to evaluate their performance in predicting the status of ER, PR, and HER2 directly from histologic features of H&E-stained tissues. They found that the models could robustly predict hormone receptor status, especially for focal tumor regions.

Specifically, the areas under the receiver operator characteristic curve (AUCs) in focal tissue regions (>135 million patches) were 0.939 (95% CI, 0.936–0.941) for ER, 0.938 (0.936–0.940) for PR, and 0.808 (0.802–0.813) for HER2. In a large set of tissue slides (3274 slides from 1249 patients at 37 sites), AUCs were 0.86 (95% CI, 0.84–0.87) for ER, 0.75 (0.73–0.77) for PR, and 0.60 (0.56–0.64) for HER2.5

They also found strong associations between predicted biomarker status and certain morphological features of tissues. For instance, ER+/PR+ status was associated with low-grade and lobular morphology, while ER-/PR-/HER2- status was correlated with increased inflammation. These associations between biomarker profiles and histologic features confirmed the high interpretability of data obtained using the deep learning models.5

Deep learning-based tissue “fingerprints” predict ER/PR/HER2 status in H&E-stained breast tumors

Determining clinical and histological breast cancer subtypes — and consequently, the method of treatment — is often based on molecular testing for ER, PR, and HER2. However, molecular testing does not take into account the morphological characteristics of tumors, which can also affect patient prognosis and treatment outcomes.6 Therefore, there is a pressing clinical need for diagnostic methods that can robustly distinguish breast cancer subtypes by integrating both molecular and morphological information.

In an effort to improve the accuracy, reproducibility, and speed of pathological diagnosis of breast cancer based on tissue morphology, researchers from Ellison Institute for Transformative Medicine developed and tested a deep learning algorithm trained to predict the clinical subtypes of breast cancer and ER/PR/HER2 immunoreactivity based on the histologic features of H&E-stained images.7

By leveraging large, unannotated datasets of unlabeled images, this new deep learning algorithm can learn H&E features and distinguish clinical subtypes of breast cancer. To achieve this, researchers trained the algorithm to learn histomorphological patterns — or “tissue fingerprints,” as the authors call it — and make predictions of clinical tumor subtypes and molecular characteristics based on these morphological features.

To validate the performance of the deep learning algorithm, the researchers applied the trained neural network to predict the status of ER, PR, and HER2 in two independent datasets of H&E images: a discovery dataset consisting of 939 images from 40 sites and a large test set consisting of 2531 whole slide images from the Australian Breast Cancer Tissue Bank (ABCTB). In the test dataset, the algorithm provided AUCs of 0.89 for ER, 0.81 for PR, and 0.79 for HER2.7

Importantly, this fingerprint-based approach outperformed traditional transfer-learning and direct patch-based classifications, especially when the training sets were stain-normalized. In addition, the fingerprint-based ER classifier could “learn” and distinguish epithelial patterns in breast tumors, consistent with previous reports correlating tissue architecture with clinical ER status.8

Future perspectives

The immense potentials of AI and especially of deep learning in pathology are becoming increasingly evident. Compared with manual evaluation of histopathology images, deep learning models can predict the status of biomarkers in a fraction of time and costs while maintaining high accuracy, reliability, and interpretability.

Moreover, using deep learning algorithms to identify tissue fingerprints can provide valuable insight into tumor tissue biology directly from H&E images, accelerating diagnosis and improving clinical decision making.

Nonetheless, the implementation of high-performance deep learning pipelines in histopathology is often limited by the lack of large datasets of well-annotated pathology images. Although the implementation of such deep learning systems may accelerate diagnosis and decrease biomarker detection costs, further work is needed to further improve the accuracy, reliability, and reproducibility of deep learning models in predicting biomarker status in breast cancer, especially for rare histologic subtypes with limited training data.

Future research is needed to develop and validate algorithms that can objectively identify novel histologic features or histomorphology-biomarker associations.


  1. Jin Y-H, Hua Q-F, Zheng J-J, et al. Diagnostic Value of ER, PR, FR and HER-2-Targeted Molecular Probes for Magnetic Resonance Imaging in Patients with Breast Cancer. Cell Physiol Biochem. 2018;49(1):271-281. doi:10.1159/000492877
  2. Allison KH, Hammond MEH, Dowsett M, et al. Estrogen and Progesterone Receptor Testing in Breast Cancer: ASCO/CAP Guideline Update. J Clin Oncol. 2020;38(12):1346-1366. doi:10.1200/JCO.19.02309
  3. Finn RS, Press MF, Dering J, et al. Estrogen Receptor, Progesterone Receptor, Human Epidermal Growth Factor Receptor 2 (HER2), and Epidermal Growth Factor Receptor Expression and Benefit From Lapatinib in a Randomized Trial of Paclitaxel With Lapatinib or Placebo As First-Line Treatment in HER2-Negative or Unknown Metastatic Breast Cancer. J Clin Oncol. 2009;27(24):3908-3915. doi:10.1200/JCO.2008.18.1925
  4. Welsh AW, Moeder CB, Kumar S, et al. Standardization of Estrogen Receptor Measurement in Breast Cancer Suggests False-Negative Results Are a Function of Threshold Intensity Rather Than Percentage of Positive Cells. J Clin Oncol. 2011;29(22):2978-2984. doi:10.1200/JCO.2010.32.9706
  5. Gamble P, Jaroensri R, Wang H, et al. Determining breast cancer biomarker status and associated morphological features using deep learning. Commun Med. 2021;1(1):14. doi:10.1038/s43856-021-00013-3
  6. Turashvili G, Brogi E. Tumor Heterogeneity in Breast Cancer. Front Med. 2017;4:227. doi:10.3389/fmed.2017.00227
  7. Rawat RR, Ortega I, Roy P, et al. Deep learned tissue “fingerprints” classify breast cancers by ER/PR/Her2 status from H&E images. Sci Rep. 2020;10(1):7275. doi:10.1038/s41598-020-64156-4
  8. Rawat RR, Ruderman D, Macklin P, Rimm DL, Agus DB. Correlating nuclear morphometric patterns with estrogen receptor status in breast cancer pathologic specimens. npj Breast Cancer. 2018;4(1):32. doi:10.1038/s41523-018-0084-4


Christos received his Masters in Cancer Biology from Heidelberg University and PhD from the University of Manchester.  After working as a scientist in cancer research for ten years, Christos decided to switch gears and start a career as a medical writer and editor. He is passionate about communicating science and translating complex science into clear messages for the scientific community and the wider public.

Share This Post

Leave a Reply