The Significance of Robust Validation in Automated Quality Control Solutions


In digital pathology, where precision and accuracy are paramount, the role of automated quality control (QC) solutions is becoming increasingly vital. Automated quality control solutions identify slide and tissue artifacts so the laboratory can identify affected slides, creating efficiency benefits for the workflow and improving data quality for research and analysis.

However, the efficacy of an automated quality control solution is dependent on two crucial components: a robust development dataset and a meticulous validation and testing process. Proscia’s Automated QC solution has been trained and tested on the most diverse and extensive dataset of any automated quality control solution on the market today, and our validation process involved training on over 70,000 whole slide images (WSIs) with performance assessment of over 10,000 WSIs.

How Automated Quality Control is Developed and Validated

For any automated quality control digital pathology solution, the foundation of accuracy rests on the dataset used for training the algorithm and testing performance. A comprehensive and diverse dataset includes a wide variety of scanners, organs, stains, and sites. The automated quality control algorithm is trained on the validation dataset, and therefore must have variety in that dataset to perform effectively with variable real world laboratory data types.

However, a training dataset alone is not enough to determine the performance in a laboratory. Once the solution is developed, testing is conducted on a subset of the data to verify the functionality and derive performance statistics. The testing process is also critical to rigorously assessing the solution’s accuracy and reliability. With a limited testing dataset – such as tens, or hundreds of slides – the lack of comprehensive data variables can skew the performance results when compared to actual performance in a laboratory. The testing dataset must also be large enough to confidently demonstrate performance across scanners, organs, stains, and sites with differing slide preparation workflows.

Why a Robust Validation Dataset Matters to Laboratories

An extensive validation and testing dataset can both speed up the deployment of the automated quality control solution and allow the laboratory to recognize workflow efficiency and accuracy benefits.

A solution that has been trained and tested on a limited dataset is less likely to produce accurate outputs when it encounters the variability present in a real laboratory environment. As a result, the solution may incorrectly detect when artifacts are or are not present on the laboratory’s whole slide images.

Poor performance may prevent the usability of the solution, eliminating the ability for the laboratory to recognize the benefits of the automated quality control solution. Automated quality control is designed to create critical efficiency and accuracy benefits for laboratories, which often dedicate between 0.5-1 FTE per scanner to manually spot check each image.

For example, using Proscia’s Automated QC in a simulated manual review process resulted in a 6X faster review of artifacts, creating quality control time savings of 83% per slide. Additionally, automated quality control solutions create data consistency by eliminating variability in what constitutes “poor quality” for manual reviewers in a laboratory. However, poor performance in an automated quality control solution may prevent users from being able to trust the solution output, and therefore require the laboratory to complete manual rechecks of each slide – invalidating the purpose and benefits of the solution.

To address a poor performing solution, the laboratory is likely to need to allocate resources to manually curate hundreds or thousands of whole slide images and then provide this proprietary data to their solution provider. The provider can train the algorithm on the laboratory’s dataset to update performance and then redeploy. This work can add months to the deployment process.

In contrast, a solution trained and tested on tens of thousands of images and a multitude of data variables more reliably produces “out of the box” good performance. The robust validation and testing streamlines deployment by eliminating the need for time-consuming calibration to achieve optimal performance on a specific dataset. Therefore, an extensive and diverse dataset is critical to recognize enhanced data quality and workflow efficiency benefits in the real world laboratory.

Proscia’s Approach to Validation

Proscia’s Automated QC application analyzes whole slide images generated from formalin-fixed paraffin-embedded (FFPE) tissue stained with hematoxylin and eosin (H&E), immunohistochemistry (IHC) or special stains and identifies the image or slide artifacts present. Detected quality issues include Out of Focus, Pen Mark, Air Bubble, Cut Off Tissue, Unscanned Tissue, Tissue Fold, Coverslip Line, Striping, Printed Slide Mark, and No Macro.

Proscia’s Automated QC solution is trained and tested on a repository of over 70,000 whole slide images, 500,000 tasks, and 2 million annotations. This performance is a result of the diverse and extensive dataset, including 6 scanner makes, 49 organs, and 402 unique stains including a distribution of H&E, IHC and special stains.

Proscia’s validation process involved the assessment of approximately 10,000 whole slide images and demonstrated an average accuracy of over 96% and a sensitivity of over 99%. Accuracy is defined as the total number of true positives and true negatives divided by the total number of slides. True positive means that Automated QC accurately detected that an artifact is present, while true negative means that Automated QC accurately detected that an artifact is not present. Sensitivity is how often the algorithm correctly identifies when the artifact is present.

Leading with Laboratory Quality

The efficacy of automated quality control solutions in digital pathology hinges on the quality of the dataset and the precision of the validation process. As artificial intelligence becomes more routine as an aid in the digital pathology laboratory, robustly validated solutions are paving the way for a new era in accurate and efficient diagnostic practices.

Automated quality control solutions are becoming more commonplace in digital pathology laboratories today. Research organizations use Automated QC to start studies faster and drive better results by increasing efficiency and improving data quality. Laboratories of all types use Automated QC to improve operational efficiency and turnaround times, while optimizing resource allocation to reduce the cost of review.

Your automated quality control solution should be trained and validated on a dataset that is as varied as your laboratory’s. With a robust dataset behind your automated quality control solution, your laboratory can have confidence to reallocate time and costs to other higher-value efforts.

Share This Post

Leave a Reply