Publication Review – The human-in-the-loop: an evaluation of pathologists’ interaction with artificial intelligence in clinical practice

Digital Pathology Review

Anna C S Bodén et al. Histopathology 2021 Aug;79(2):210-218.


Review by Suchandrima Bhowmik


Digital pathology is being adopted in clinical practices to provide assistance with diagnostic tasks. Digital pathology can be defined as the conversion of an image of a slide produced by the light microscope into one which can be accessed and viewed via a computer workstation. Digital pathology is also known as virtual microscopy or whole slide imaging.

Introduction of digital image analysis (DIA) in diagnostics offers several advantages such as increased reproducibility, accuracy, and efficiency. DIA has been shown to provide great benefit in research and is starting to be useful in clinical practice as well.

Ki67 scoring in breast cancer is a popular candidate that helps to determine the role of AI in diagnostics. Recent studies have shown that DIA is equal to or better than the manual scoring that is carried out by pathologists.

However, DIA can fail due to poor slide quality or regressive performance when applied to data from sources that are not included in the training data. To overcome these problems, research is being done with human-in-the-loop (HITL) AI systems where the pathologists interact with DIA resulting in an improvement in performance.

Thus, this study aims to assess the clinical use of DIA in Ki67, especially the human-in-the-loop interplay between DIA and pathologists.

How was the study carried out?

The study analyzed 200 areas that contained 200 tumors in Ki67 stained slides from different breast cancer cases. Out of the 200 cases, 184 were primary invasive breast cancers and the rest were distant metastases, local metastases, or suspected cancers. The patient data, as well as the data on the reviewing pathologists, were kept anonymous during the time of extraction. The extracted areas were denoted V1 and V2 and corresponded to two different versions, V1 from May 2015 and V2 from February 2017.

No additional staining or scanning was performed for the study. All the specimens were processed according to in-house clinical procedures and the NordiQC external quality programme. Scanning was carried out with either an Aperio Scanscope AT Turbo scanner or a Hamamatsu NanoZoomer XR.

The Ki67 tool uses a HITL workflow that involves four steps

  • Selection of the area for analysis using a circle of fixed radius to indicate the hotspot for the pathologist.
  • The tool then processes this area for positive and negative cells and the result is provided to the pathologist as an interactive visualization. This makes it possible for the pathologist to add or remove cells and to change the positivity of the cells whilst updating the Ki67 index.
  • The pathologist marks the result as verified.
  • Finally, an estimate of ground truth is obtained using three experienced breast pathologists. The median Ki67 index obtained is used as ground truth.

Three different scoring methods were used to evaluate the study: eyeballing, automatic scoring, and the HITL scoring. Eyeballing was performed before ground truth by measurement of the Ki67 positive cells only by visual assessment of the digital image and not via actual counting. The automatic scoring value was obtained from the nuclear detection algorithm. The HITL score corresponded to the Ki67 index obtained from clinical review.

Three levels of statistical accuracy were calculated: Ki67 status, Ki67 index, and individual cells. The statistical analysis was performed with the help of PNDAS, SCIPY, and STATSMODELS.

What did the results indicate?

The results indicated that automatic scoring led to the improvement of accuracy with both versions of the algorithm. It further showed that the Ki67 index error for eyeballing was much higher as compared to automatic scoring and HITL scoring when compared to the V2 data. Furthermore, no overall improvement was observed by HITL corrections as compared with automatic scoring.

The results showed that all the three levels that were analyzed in this study were in support of the safe use of DIA. The Ki67 status and the Ki67 index were both more accurate with lower deviation of standard error. DIA was found to have several advantages over visual assessment in determination of the Ki67 index in BC in the following areas: tumor heterogeneity, misidentification of tumor cells, visual assessment error, poor immunostaining or slide quality, and estimation of non-tumor cells.

Visual inspection of the 200 annotated cases was done before and after inspection by one pathologist. This helped in the determination of the causes of suboptimal results. For many cases it was found that HITL led to deterioration. However, visual representation also revealed issues where failure of the DIA system had been corrected by HITL. Some of the major causes for the failure were found to be misidentification of stroma and stroma morphology, overlapping positive nuclei, staining quality, together with a variety of different factors.

The researchers concluded that individual cases with poor DIA can be remedied by HITL intervention by a pathologist, but these were subject to human adjustments which could even worsen the result. Justification of pathologist interaction was found to be dependent on the time taken to perform the HITL task. In the current study, the time spent was improved in V2 and that in turn led to higher accuracy of the nuclear detection algorithm, resulting in fewer corrections.


Although the current study suggests that in the case of these 200 breast cancer cases, the pathologist could have been removed with the added benefit of saving time, the results are somewhat generalized.

Another drawback of this study is that there was no evaluation carried out as to the pathologist’s experience regarding usage of the HITL tool.

The HITL approach helps to provide an important safety mechanism for the detection and correction of algorithmic errors, but retaining human oversight is still important for the safety and quality control in digital image analysis.



  • Boden A C S, Molin J, Garvin S, West R A, Lundstrom C, Treanor D. The human-in-the-loop: an evaluation of pathologists’ interaction with artificial intelligence in clinical practice. Histopathology 2021, 79, 210–218. DOI: 10.1111/his.14356.
  • Snead D R J, Tsang Y, Meskiri A et al. Validation of digital pathology imaging for primary histopathological diagnosis. Histopathology 2016 Jun;68(7):1063-72. doi: 10.1111/his.12879.
Share This Post

Leave a Reply