New Computational Tool for Automated Cell Type Detection and Morphological Feature Extraction from H&E-Stained Slides

by Christos Evangelou, MSc, PhD – Medical Writer and Editor

Accurate identification of cell types and extraction of morphological features from histopathological images is crucial for understanding disease processes and guiding treatment decisions. However, manual analysis of these images is time consuming and prone to subjectivity.

To address these challenges, a team of researchers at the University of Helsinki, Helsinki University Hospital, the University of Turku and Turku University Hospital introduced the hematoxylin and eosin (H&E) image processing pipeline (HEIP), an automated tool designed to detect cell types and extract morphological features from digitalized H&E-stained slides.1

This modular software pipeline detects, classifies, and extracts dozens of morphological properties from millions of individual nuclei in H&E-stained histological images. Quantitative whole-slide data can then be correlated with genomic and other omics measures gathered from the same tumors.

Overall, HEIP demonstrates the potential of computational pathology methods to generate quantitative, multi-parametric data from tissue images that can be integrated with genomic and other data for translational cancer research.1

“Our validation studies suggest that HEIP is an efficient methodology for detecting cell types and their morphological features from whole-slide images, enabling various downstream analyses,” said Professor Sampsa Hautaniemi, DTech, who was the corresponding author of this study.

The report was published in the Journal of Pathology Informatics.

Rationale: Automating Quantitative Histopathology

Cancer researchers rely on the visual examination of tumor tissue samples stained with H&E to evaluate features such as cell morphology, spatial heterogeneity, and mitotic activity. However, manual image feature extraction is laborious and subject to human bias. Moreover, the diversity of cell morphology under a microscope poses significant challenges for the accurate characterization of tumors.1

As digital pathology is becoming an increasingly important field in cancer research, there is a need for computational tools that facilitate the extraction of information from whole-slide images (WSIs).

“For many applications, such as combining cell morphology with genomic data, it is necessary to extract and annotate cell types from WSIs,” noted Prof Hautaniemi. “Our goal was to develop an efficient computational approach that can be used to detect various cell types from WSIs for downstream analyses.”


HEIP is an open-source software for processing H&E histology images to detect cell nuclei, classify them into different cell types, and extract morphological features.1 The core of HEIP is a modified version of the HoVer-Net architecture, which performs simultaneous segmentation and classification of nuclei using a multi-task convolutional neural network. It is designed to be modular and customizable.

“We developed a pipeline that automatically detects cell types and extracts various morphological features from WSIs. The use of a modified HoVer-Net deep learning architecture with an advanced post-processing approach enabled us to perform simultaneous segmentation and annotation of cells,” explained Prof Hautaniemi.

The research team used the PanNuke dataset, which contains automatically generated nuclei instance segmentation and classification from various tissue types and cancers.1 Additionally, the researchers trained the model using a dataset of H&E images from patients with ovarian high-grade serous carcinoma (HGSC). The modular design of HEIP facilitates the simultaneous segmentation and annotation of cells from digitalized H&E WSI. The model was validated using subsets of images from the prospective DECIDER study, ensuring robust performance across different datasets.

Proof of Concept: Detecting Cell Types and Extracting Morphological Features from H&E-stained Slides

The research team evaluated the performance of HEIP using a dataset containing images from patients with HGSC, a genetically complex malignancy.1 The researchers first trained HEIP’s convolutional neural network module using over 200,000 cell annotations from different reference datasets and HGSC sample regions pre-classified by an expert pathologist.

HEIP showed high precision in detecting and classifying HGSC tumor cells, as well as in distinguishing them from inflammatory, connective, and normal epithelial cells. However, its performance was lower in connective and inflammatory cells.

“Our key finding is that HEIP is trustworthy for annotating cell types from WSIs, as its estimates for cell type annotations agreed well with the pathologist’s ground-truth annotations,” noted Prof Hautaniemi.

The team also evaluated the utility of HEIP in downstream analysis of the association between nuclear morphological features and genomic ploidy values computed from whole-genome sequencing data of matched tumor samples. There was a moderate but statistically significant positive correlation between the nuclear primary axis length of neoplastic nuclei and ploidy.1

“This suggests that WSIs are a rich source for studying genetic aberrations that have an impact on cell phenotype levels, such as ploidy aberrations here,” Prof Hautaniemi explained.

Future Work

Despite the promising performance of HEIP, the authors acknowledge that the software has limitations. Inconsistent identification of dead cells and other rare populations reduces classification accuracy across the tissue. Training the algorithm using larger datasets and images with high diversity in terms of cell type and morphology could improve the classification of rare cell populations.

The tool also faces challenges in properly segmenting large and irregularly shaped nuclei, which could skew feature quantifications. Upgrading the architectures of HEIP to handle a broader spectrum of nuclear morphologies could improve software performance. Broader validation across other cancer types and larger longitudinal sample sets is also warranted.

“Here, we focused on instance segmentation to detect cell nuclei and compute their features. To account for more general context, such as growth patterns, cell types, and cell morphological features, we are now focusing on integrating instance and semantic segmentation,” said Prof Hautaniemi.

Despite these limitations, automated pipelines like HEIP could accelerate quantitative analysis of histology images and facilitate extraction of cell features and spatial distributions across whole slides. This can provide insights into tumor morphology and biology beyond what the human eye can discern.

According to Prof Hautaniemi, as HEIP allows for the estimation of cell composition, it could be especially useful in addressing questions related to heterogeneity and the tumor microenvironment, which have clinical relevance.

The study was funded by the European Union’s Horizon 2020 Research and Innovation Programme, the Sigrid Jusélius Foundation, and the Cancer Foundation Finland.


  1. Ariotta V, Lehtonen O, Salloum S, et al. H&E image analysis pipeline for quantifying morphological features. J Pathol Inform. 2023;14:100339. Published 2023 Oct 5. doi:10.1016/j.jpi.2023.100339

Share This Post

Leave a Reply