Hyperspectral Imaging and Machine Learning Bring Automatic Annotation of Pathology Images One Step Forward

by Christos Evangelou, MSc, PhD – Medical Writer and Editor

Artificial intelligence (AI) is increasingly being adopted in the clinic for pathological diagnosis. Nevertheless, AI-assisted classification of tissues relies on training of AI algorithms using high-quality datasets that are manually annotated by expert pathologists. This requirement hinders the full automation of the process and is time consuming, laborious, and prone to human error.

In a recent study, researchers at the Shanghai Key Laboratory of Multidimensional Information Processing and Engineering Research Center of Nanophotonics & Advanced Instrument developed an automated method for generating an annotated benchmark dataset for the classification of pathological tissue images. The dataset was generated using double-stained tissue images captured by hyperspectral imaging.1

The AI algorithm trained using the generated dataset was able to accurately classify different types of tissues and pathological conditions, demonstrating the potential of hyperspectral imaging and machine learning for the development of automated image analysis algorithms in digital pathology.

“We proposed a new strategy for generating annotated pathological benchmark datasets from hyperspectral images of double-stained tissues. Our findings suggest that this method overcomes mismatches between tissues and annotation on separate stained sections and achieves automatic annotation by utilizing hyperspectral images,” said Qingli Li, PhD, professor at East China Normal University and the leading investigator of the study.

Commenting on the clinical implications of their findings, Dr. Li stated that their method can be applied on multi-gigapixel whole slide images for automated annotation, “which is especially applicable to the imperceptible and sporadic lesion tissues or cells that the pathologists are concerned about.”

“The quantitative analysis of whole slide images provides clinicians with accurate pathological indicators to predict disease pathology and select appropriate treatment,” he added.

The report was published in the journal Optics & Laser Technology.


Study Rationale and Approach: Using Hyperspectral Imaging and Machine Learning to Generate Pathological Benchmark Datasets

Pathological tissue classification is an integral part of the diagnosis of several conditions, including cancer. It is essential that pathologists are able to accurately and efficiently classify tissues to provide patients with the most effective treatments. However, traditional manual methods of tissue classification can be time consuming and subject to human error, leading to inaccurate diagnoses and treatment plans.

“High-quality annotated pathology datasets are expensive and difficult to access, and pathological images need to be labeled by experienced pathologists manually. However, the performance of AI algorithms is effectively data-driven, which leaves substantial room for improvement in the computer-aided diagnosis domain,” said Dr. Li when asked about their rationale for conducting this study.

The researchers used hyperspectral imaging to capture images of double-stained tissues and then used a machine learning algorithm to automatically generate a benchmark dataset.1 To overcome the problem of staining differences in RGB color images, the researchers leveraged the abundant spectral information of double-stained pathology tissues captured using hyperspectral imaging. In this pipeline, H&E staining information was used to obtain pathological information, whereas immunohistochemical (IHC) staining information was used to overcome the requirement for image annotation.

The authors then used a spatial-spectral-based hyperspectral generative adversarial network (GAN), which is an unsupervised machine learning structure, to identify the different tissue components in the images.1 This approach allowed them to classify the tissues into different categories, such as normal or pathological, based on their spectral characteristics.

“We added a band selection strategy before the style transfer network to help us synthesize pseudo-color images from hyperspectral images and eliminate the spectral differences of slides,” Dr. Li explained. He also noted that the integration of spectral characteristics and IHC staining information into the segmentation network was intended to make tumor region identification more accurate.


AI-assisted Analysis of Hyperspectral Images of Double-stained Tissues Provides Accurate Tissue Classification

The use of the spatial-spectral-based hyperspectral GAN allowed the researchers to transform hyperspectral images into standard histological images. Moreover, the integration of a graph-cut method with a gradient boosting decision tree enabled the automated generation of tissue annotations based on the spectral and spatial characteristics of images of tissues stained with standard H&E and IHC.

Comparison of tissue classification results between segmentation networks and experienced pathologists showed that the proposed automated method of generating a pathological benchmark dataset using hyperspectral imaging technology provided an accuracy rate of over 90% in classifying different types of tissues, including lung adenocarcinoma, intrahepatic cholangiocarcinoma, gastric cancer, and colorectal cancer tissues.1

The study also showed that the proposed method could capture almost all target regions and automatically annotate small tumor regions. This is particularly important when analyzing whole slide images, which typically contain mixed tumor and non-tumor regions.


Future Work

Although this novel method brings automatic annotation of pathology images one step forward, the dataset used in this study was relatively small (310 images), and validation of the proposed method in a larger dataset with more diverse pathological images is necessary. In addition, the performance of the GBDT classifier could be improved to further enhance the accuracy of tissue classification. Future studies are also needed to compare the proposed method with other state-of-the-art methods in terms of accuracy, sensitivity, and specificity.

In current clinical practice, H&E and IHC staining are used independently for pathological diagnosis because of the large gap between the two staining methods in terms of staining technology and pathological information.

“When we produced double-staining samples, we found that excessive staining depth existed in some double-stained tumor samples, which resulted in uncertainties in tissue features,” Dr. Li noted.

He also said that future studies are needed to determine the generalizability of the generated dataset to other types of tissues and staining protocols, as well as the potential biases introduced by the machine learning algorithm used to generate the dataset.

“We are interested in exploring hyperspectral image information of additional practical dyes and biological tissues to provide a new method to assist pathologists in fast tumor localization and diagnosis,” Dr. Li stated.



  1. Wang J, Mao X, Wang Y, Tao X, Chu J, Li Q. Automatic generation of pathological benchmark dataset from hyperspectral images of double stained tissues. Opt Laser Technol. 2023;163(March 2022):109331. doi:10.1016/j.optlastec.2023.109331

Share This Post

Leave a Reply