by Christos Evangelou, MSc, PhD – Medical Writer and Editor
For many diseases, molecular testing may provide rich information for both diagnosis and treatment planning. Combining or aggregating tissue morphology information with molecular data may improve the accuracy of patient matching. Nonetheless, molecular analyses, such as RNA sequencing, are expensive and time-consuming and may require additional tissue samples. Predicting gene expression values from whole slide images (WSIs) is limited by weakly supervised challenges due to the lack of pixel-level annotations for each tile in the WSI. On the other hand, searching for similar anatomic clues has been demonstrated to be a powerful unsupervised technology.
In a recent study, researchers at Mayo Clinic and the University of Waterloo developed a deep learning model called tRNAsformer to predict RNA sequence expressions from WSIs without requiring expensive and time-consuming molecular tests.1 By using multiple instance learning and attention-based topology, tRNAsformer can learn to predict gene expression values from WSIs while also representing the internal structure of the image. The authors believe this approach has potential clinical applications in digital pathology for improving diagnosis and prognosis of diseases.
“Our findings suggest that it is possible to estimate molecular data from tissue morphology, at least for bulk RNA sequencing and at least for some cancer subtypes,” said Hamid R. Tizhoosh, PhD, professor at Mayo Clinic and corresponding author of the study.
“By increasing search accuracy, combined with the rigorous majority voting among matched patients who have been evidently diagnosed, technologies like tRNAsformer make image search a much more reliable diagnostic tool with explainable outcomes,” he added.
The report was published in the journal Communications Biology.
Approach: Combining Multiple Instance Learning and Attention-based Topology
The authors proposed a multi-task deep learning model called tRNAsformer that can predict RNA sequence expression from WSIs and, at the same time, a primary diagnosis. By using multiple instance learning and attention-based topology, the tRNAsformer can predict gene expression values even from WSIs without the need for pixel-level annotations for each tile in the image.
To train the model, the authors used data from The Cancer Genome Atlas (TCGA) to gather kidney WSIs and their related RNA sequencing data. The authors conducted several experiments to evaluate the performance of the tRNAsformer and compared it with other existing methods, such as Yottixel and HE2RNA. They measured accuracy and convergence speed as performance metrics. To confirm the generalizability of their model, they used an external kidney cancer dataset from the Ohio State University (external validation dataset).
tRNAsformer Outperforms State-of-the-art Algorithms in Predicting Gene Expression Values From WSIs
The authors compared the performance of the tRNAsformer with that of other existing methods for predicting RNA sequence expressions from WSIs. The results showed that the tRNAsformer was able to learn rich information to represent WSIs. Notably, tRNAsformer outperformed the search engine Yottixel in terms of accuracy in representing WSIs.
The authors also compared the performance of the tRNAsformer to that of HE2RNA, a state-of-the-art model for predicting gene expression values from WSIs. The results showed that the tRNAsformer achieved comparable performance to HE2RNA while requiring significantly less computational resources and simultaneously providing a primary diagnosis.
“Our findings suggest that tRNAsformer is a promising approach for predicting RNA sequence expressions from images and outperforms existing methods in terms of accuracy and efficiency, particularly for tissue representation,” Dr. Tizhoosh noted.
tRNAsformer WSI Classification
Deep learning methods can be applied in digital pathology to address clinical challenges such as prognosis and diagnosis. “By extracting molecular features from WSIs, we can gain insights into the underlying biology of diseases and potentially identify new biomarkers for diagnosis and treatment,” Dr. Tizhoosh explained. Therefore, the authors also investigated whether the tRNAsformer could be used for image search and classification by learning rich information to represent WSIs.
“Beyond a potential computerized framework for estimating whole-transcriptome sequencing, we were interested in how such models that understand both tissue morphology and molecular data can contribute to implicit or explicit ‘multi-modal’ search in archives of histopathology images of digitized tissue samples,” Dr. Tizhoosh added.
Slide-level classification results demonstrated that the tRNAsformer predicted renal cell carcinoma subtype with an accuracy of up to 96.25% in the TCGA dataset and 82.39% in the external validation dataset.
“The experiments with TCGA and external validations showed a very promising increase in the accuracy of image search. The designed and trained tRNAsformer is a multi-task model that not only predicts bulk RNA-seq but also provides a primary diagnosis for the input whole slide image. The embeddings of such networks are intrinsically better suited for patient matching,” said Dr. Tizhoosh.
Commenting on the clinical implications of these findings, he noted: “The results of this study suggest that the proposed tRNAsformer can assist as a computational pathology tool to facilitate a new generation of search and classification methods by combining the tissue morphology and the molecular fingerprint of the biopsy samples.”
Looking Ahead
Although the study proposed a promising approach for predicting RNA sequence expression from images, there are still some unanswered questions and limitations that need to be addressed in future work.
The study only focused on WSIs from patients with kidney cancer, and it is unclear whether tRNAsformer can be applied to other types of cancer or diseases. In addition, future studies are needed to confirm whether the model can be used to predict other types of molecular information, such as spatial transcriptomics. Future studies should consider using larger and more diverse datasets to evaluate the generalizability of tRNAsformer. Future studies should also compare tRNAsformer with other existing methods to further validate its effectiveness.
“Going from bulk to single cell and to spatial transcriptomics may not be a straightforward path. Generalization of such models may also depend heavily on population diversity, such that single hospitals cannot train a globally useful model. Here, technologies such as federated learning become paramount to make highly diverse data accessible for model training,” Dr. Tizhoosh said.
References
- Alsaafin A, Safarpoor A, Sikaroudi M, Hipp JD, Tizhoosh HR. Learning to predict RNA sequence expressions from whole slide images with applications for search and classification. Commun Biol. 2023;6(1):304. doi:10.1038/s42003-023-04583-x