Evaluating The Accuracy Of Image Analysis Algorithms Applied To Digital Histological Images Of Cutaneous Melanoma


Histological digital images can be created using a camera mounted on a light microscope, or through whole slide image (WSI) generation using a whole slide scanner. Before any such tool could be integrated into clinical workflow, the accuracy of the technology should be carefully evaluated and summarised. Therefore, the objective of this review was to evaluate the accuracy of existing image analysis algorithms applied to digital histological images of cutaneous melanoma.


This systematic review and meta-analysis was written and performed in accordance with our protocol (PROSPERO ID 336,714) and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses-Diagnostic Test Accuracy (PRISMA-DTA) Statement. Of the sixteen studies included in this systematic review, six studies had data that could be meta-analysed. The extracted data from five of these studies were from published work and additional data from one study was provided by the authors. Over these 5 studies, 1,935 specimens were included, of which at least 1,088 were melanoma specimens.


The intended use of the image analysis (IA) varied among the studies, with most focusing on binary classification tasks such as melanoma detection and localization. Some studies performed more complex diagnostic classifications. Different techniques were used for IA, with convolutional neural networks (CNNs) being the most commonly employed, although the architecture of the CNNs varied across studies. Other techniques used included support vector machines (SVMs), image processing with adaptive thresholding, and a combination of CNN and SVM. The reference standard used for evaluation also varied, with pathologists’ diagnostic labels, histological features, manual annotation, and immunohistochemical staining being among the methods used. The units reported for performance analysis varied across studies, including pixel-based, cell-level, patch-level, and slide-level classifications. Some studies did not specify the units used for analysis, and one study appeared to focus on whole slide images (WSIs).

The mean sensitivity was 90% (CI 82%, 95%) and mean specificity was 92% (CI 79%, 97%). For the studies which could not be included in the meta-analysis due to deficiencies in reporting, the performance metrics are summarised in Supplementary Table 3.


Based on limited and heterogenous data, IA offers high accuracy when applied to melanoma histological images. The focus of work to date has been on developing the technology in this field, which has accelerated over the past decade. Going forwards, future work should address the clinical application of such models and evaluate their use as a screening/ triage tool or for prognostic/ predictive biomarker generation. The quality of existing studies is variable but is improving with time—it is important that authors report their data according to AI-specific guidelines once they are published.


Share This Post

Leave a Reply