How do you establish the science of characterising biological heterogeneity, within a tissue? That step is, in my view, the best way for us to build sufficient knowledge upon which to derive algorithms which are robust enough and competent enough to address the right questions.
Interview with Gustavo Rohde
Professor of Biomedical Engineering and Electrical and Computer Engineering, University of Virginia, Charlottesville, Virginia, United States
BIOSKETCH: Gustavo Rohde is a professor of Biomedical Engineering and Electrical and Computer Engineering at the University of Virginia, USA, where he directs the Imaging and Data Science Laboratory (imagedatascience.com). He earned a bachelor’s degree in physics and mathematics, and master’s degree in engineering from Vanderbilt University in 1999 and 2001, respectively. He earned a Ph.D. degree in applied mathematics and scientific computation from the University of Maryland, USA, in 2005. He has published over 100 peer reviewed publications related to biomedical imaging, signal and image analysis, applied mathematics, computational biology, and machine learning. He has served in the editorial board for IEEE Transactions on Image Processing, IEEE Signal Processing Letters, IEEE Journal for Biomedical and Health Informatics, BMC Bioinformatics, and Cytometry A journals. He currently serves as a regular member for the National Institutes of Health Biodata Management and Analysis (BDMA) study section. Dr. Rohde served as program co-chair for the IEEE International Symposium on Biomedical Imaging (ISBI) 2021.
Interview by Jonathon Tunstall – 01, July, 2021
Published – 22, Sept, 2021
JT – Professor Rohde, you are a computational pathologist. At what stage in your career did you come across the field of digital pathology? Was this something that you discovered by accident, or did you already have a pre-interest in imaging technologies?
GR – Maybe between 10 and 15 years ago when I first joined the academic community as a professor it became obvious that digital pathology was a new field appearing from an introduction of hardware such as whole slide imaging. It was clear that it was a growing field and there were opportunities for folks like us to make contributions in the application of digital image analysis techniques through math and modeling, particularly in cancer. With the introduction of whole slide imaging came the realization from a lot of pathology departments that the future was digital. More and more pathologists were opening up, being willing to be adventurous and step into the world of seeing what digital image analysis could contribute to their field. I started to think “how can we improve things from their perspective” and the more I met people, the more it became clear that there were important contributions to be made.
JT – Did you come across a particular scanning platform or an application which sparked your interest?
GR – Back then, I was in Pittsburgh and we had collaborations with the Pittsburgh Medical Center. At that time, they had a partnership with GE and were starting a company called Omnyx, but there were already digital scanners around from other companies such as Aperio. There were several people there in the Pathology Department that we collaborated with, and we wrote a few papers together.
JT – So you saw a fit with this new digital imaging field and the computer analysis side. Was that an overlap in terms of image analysis or data management initially?
GR – My interest from my training was in the computational modeling, mathematical modeling of the content as depicted in the image data. In pathology nowadays you have more and more resolution, so you get to see individual cells, subcellular structures as well as the tissue organization. You get the best of both worlds, but you also have a lot of data. I realized that with all the advancements that had been going on for 15 or maybe 20 years, a lot of contributions had been made and a lot of them were in what we can call workflow improvements. For example, the ability to digitize the whole slide at high resolution, the ability to store and retrieve remotely and in an easy fashion. These kinds of advancement have facilitated improved workflow on the part of the pathologist. If they want to look at something from home or to get a second opinion from a colleague across the street or across the country, it takes about the same time nowadays. There are a lot of these improvements that I broadly term workflow, but at the same time, I look at this field and say, “what is the biggest question that is in my area, in my field?” And that has to do with how we can leverage these contributions in hardware, in imaging technology, in data storage, data transmission, computation and mathematical modeling. How can we put these things together to do the following two things?
The first thing is to improve the science behind digital pathology, to build scientific understanding of the composition of cells and the composition of tissues in different malignancies, whether they be cancers or otherwise. Pathology has, as I have come to know it (and let me make it clear that I am not a pathologist) suffered from the fact that a lot of the information pathologists utilize on a daily basis, hasn’t been collected in a strictly scientific manner. It has been a gestalt approach with for example, some pathologists who like to operate at high magnification and some at lower magnification, some give more importance to counting cells and some more importance to other aspects. I am not saying that it is any way poor science, just that there may be opportunities through providing more solid, more verifiable, more quantitative measurements, to improve the science behind pathology.
The second thing would be, how can we then leverage this improved understanding to do better diagnosis, give a more accurate prognosis, to understand which therapies to give to which patients and when, so to improve patient care in a broad sense?
The combination of these two things combined, provide what I feel is an opportunity for folks like myself to contribute to the field.
JT – Perhaps we could argue that in a world where pathology is digital, your speciality of mathematical modeling has become the main branch of the field. That is, if we consider that the central component of pathology is an image and we perform analysis on that image, whether with the human brain looking for patterns, or with some sort of computer algorithm doing the same thing. We could take the view that recent advancements in pathology have not been led by enhanced biological understanding, but more by a simultaneous and rapid evolution of a series of computer-based technologies such as image handling, data management, networking and image analysis.
GR – It’s never a matter of one parameter. Mathematical modeling is required, but so are a series of other parameters starting from an understanding of the biology, the biophysics, which molecules, which proteins are important to image, to target. We need to know how those factors differ between more benign and more malignant tumors. That understanding has to be part of the equation and then once you have some hypothesis, some goals, some targets, then you have to consider what are the appropriate laboratory techniques. Once you understand whether you want to measure chromatin or maybe actin, tubulin, microtubules or mitochondria. Once you have that understanding, then the next question is, what are the appropriate laboratory techniques? Which stains do we use? What is the appropriate preparation method? Then you move onto what hardware do we need? What imaging technique can we use to obtain the required information? Putting all that together, the biology, the physics of imaging, the techniques for the mathematical modeling of what information is relevant. Putting all this together is probably the way we will see the real advancements in scientific understanding that translate into improvements in patient care.
JT – Do you think that the standardization of the preparation is one of the biggest challenges that we face at the moment?
GR – Yes. certainly, we have worked with a lot of data from different centers. Even within the same center you can sometimes see changes from month to month. Maybe you have changes in personnel or a change in basic routine preparation method. Even H&E staining can sometime require normalization and for people working in my area, that can be a central question; how do you normalize intensities across multiple images? That is a question at the heart of image analysis. It can be done digitally if we understand the origin of the data from basic principles. It can also be done through further standardizations of laboratory technique, but how are we going to get there? I think basic understanding, basic science will set the stage. We need to establish what is the most important and appropriate data that we need to gather, how to measure that data and how to do that in a systematic way that can be calibrated and standardized more easily.
JT – Have you used any commercial software packages in your work, or do you assess those packages to see how they perform against certain standards?
GR – We did a study with the people in Pittsburgh where we investigated a cancer grading package and we looked at the ability of that package to give the correct grading. Our question was, how stable is it and how reliable is this package with respect to basic imaging parameters? We already knew that with some scanning instruments there was a variability in illumination depending upon the time of day. For example, if you scan an image in the morning when the machine is at a certain temperature and hasn’t been used for a while then you will get a certain amount of illumination and intensity distribution. However, if you scan an image later in the day when the machine is warmed up and has been used for several hours, then you get a different type of intensity distribution. That is often almost imperceptible to the eye, but we were trying to see how stable this software package was in respect to small variations in image intensity, brightness, contrast etc. The results were not pretty. There was a lot of variation in the analysis of the same preparations.
JT – So variables such as temperature and ambient light can actually affect the output of an image analysis package?
GR – Potentially yes, If the illumination lamp is not perfectly calibrated, it can have an effect. As can many other factors such as sample preparation, one technician versus another. Several of these factors can cause minor changes in the appearance of the tissue preparation and it turns out that many of these algorithms that we as a community have been using for a while, if you study them carefully, show associated large variations. I think we have our work cut out for us as we have to improve that situation dramatically.
JT – Yes, I see that, reduce the variables. I must admit, I hadn’t considered before that there could be environmental effects on the output of image analysis packages.
GR – I think from an academic perspective when publishing a paper, more and more people are going to be asking for these sorts of basic sanity checks to be met before they publish.
JT – We are also now seeing some quite competent open-source image analysis applications emerging. Would you consider using one of these packages in your day-to-day research?
I know about some of these, but since we are engineers by training, we all pretty much have our own pet techniques for segmenting cells, for extracting the numerical features from the pixel intensities, we have our own techniques for modeling. That is our business, we just use what we understand best. Some people have been working in this area for over 20 years, so in some ways it’s surprising that we haven’t seen more widespread open-source, environments for digital image pathology. Certainly, the field of image analysis has had no shortage of open-source programming environments, such as Python, Matlab, etc. However, for people like us, engineers, mathematicians, all of these open-source applications are already there and it’s often only a matter of a few hours or days to find some equivalent code in GitHub or similar to download and just run. Considering this, I think these open-source packages are in some sense over-due but we all stick to our own methods.
JT – Do you have a specific tumor type or tissue type that you specialize in, maybe a target therapy?
GR – Yes, in fact the question we ask every day is, what is the contribution that can be fundamental, that you can do basic science with, that will last? What can we contribute that won’t be a flash in the pan and in ten years’ time will still be useful and relevant? From that perspective we look at the cell and say what information can be easily extracted? is it informative in different kinds of cancers? has there been prior evidence that this is a fruitful path? For these reasons we have decided to focus on finding ways to measure and quantify nuclear structure information from digital pathology images. And we have written a series of papers on a variety of cancers, melanoma, lung, thyroid, liver, breast etc. This focus comes about because over the years, through working with collaborators in pathology, real pathologists, we have sensed that one of the most well established features to extract from tissue images have been nuclei and there are a lot of stains that are specific to nuclear structure, chromatin density, nuclear morphology etc. One project we have had for a number of years and written a series of papers on, is: how can we properly model information content of chromatin distribution inside nuclei as depicted inside these different types of imaging stains? How can we do this in such a way that we can build real models into which we can integrate data from different scanners, different centers, even from different cancer types? We can the ask questions such as: how does the distribution look from malignant type of cancer A and how is it related to cancer B? We know that a lot of work has been done in this area and these are not new questions, but in the past people have mainly focused on measuring certain standalone properties that are in themselves very physically defendable but don’t provide a complete perspective on what is happening in the nucleus.
JT – My understanding is that nuclear size and nuclear volume are key parameters for distinguishing cancerous from non-cancerous cells. What I think you are saying, is that there are many other nuclear parameters that we should be taking into consideration?
Yes, but the question is which parameters and how do you combine them? People realized a long time ago that a single parameter such as nuclear area may only give a small amount of discriminative capability when trying to distinguish between a more benign and a more malignant or invasive version of a type of cancer. Then people have realized that if you include a second parameter, such as nuclear perimeter, you can get an improvement. Then of course you can add other parameters to get further refinement. At that point you have to think, what does it mean to combine these different parameters when for example, area is measured in square meters and perimeter is measure in meters? if you look at the world of physics, when do people do that? what sense does that make? We need to answer these basic questions and then use that knowledge to characterize the entire nucleus. That means not using just a single feature but using the entire pixel information so that you are sure not to leave anything behind. How can you take in all of the feature information in a way that is self-coherent and that makes physical sense? How do we do that in a way that you could talk to a physicist about, that you could explain to a pathologist, and that captures the entire information that can still be used for statistical regressions?
That’s what I’m talking about, finding the fundamental ways to turn this into a scientific discipline. If we can contribute that, and it remains to be seen, but maybe that will be a useful contribution. Nuclear morphology has had a very strong tradition in pathology and we thought it would be a good target to start with. Once you figure out some of these basic principles, you can then maybe apply the same techniques to ask what do mitochondria have to do with the energetics of the cell? We have to remember that when it comes to image analysis, we are dealing with high content. Even from an image of a single nucleus you may have a 100×100 pixel distribution. That’s very high content you have to figure out how to treat it scientifically.
JT – I look at biological systems as entirely heterogeneous and unpredictable and yet you are attempting to apply rational sciences such as physics and math to these highly variable biological systems. How can you model the biological heterogeneity in a logical, verifiable and repeatable fashion?
GR – Right, that is the next level question. Once you figure out what to measure and how to measure those paramters in a scientific manner, you still have to think about how to properly characterize the biological heterogeneity. If you look at a tumor, you are looking at thousands, maybe even millions of cells, especially with whole slide imaging. You also know from biology that even within malignant tissue, not all cells will display the cancerous phenotype that you are looking for. You know that there is a biological heterogeneity in the background, but how do you model this mathematically? This question overlaps with the field of statistics as well and what is the appropriate overlap? What is the appropriate modeling for the field? There are a lot of fundamental questions to ask and many that we still can’t answer. People have done a lot of work, they have applied a lot of techniques, but still if you ask different people, what is the standard? what is the scientifically accepted standard way to do these things? You will get too many different answers. This is why I talk about establishing the foundations of the discipline. In order to understand which model to pursue, we need to ask, what is the difference between the subcellular localization of this molecule between benign and cancer? what is the difference in the tissue organization of the different types of cells and how do you measure them? Which laboratory techniques can be standardized and how do you measure those with high enough resolution? How do you extract the right pixel information? How do you model the statistical heterogeneity in the tissue? There are a lot of questions to answer in order to establish a foundation, to establish a universally accepted ways to build the field.
JT – And are you using machine learning principles to assist with that process of improved modeling and better understanding?
GR – Yes in part. That is a hard field in the sense that there are a lot of people trying to provide different advances with machine learning methods. Also, in my view, these machine learning approaches tend to bypass the scientific underpinning that we have been discussing. Figuring out which molecules to pursue, how to measure the right things, how to treat the biological heterogeneity, how to model those parameters mathematically and statistically. Personally, I think that’s a misguided approach. A lot of work has already been done. A lot of people are directly taking pixel intensities and correlating those with patient outputs via some classifier or some deep learning hierarchy or via the evolution of a neural network etc. In my view, and it remains to be seen, but my hunch is that for a critical system (and cancer diagnosis is a critical system in that there is a definite cost at least to the patient and the relatives of making a mistake), for a critical system such as this, I feel that although machine learning may be a part of the solution it is not where we should be focusing to establish the underpinnings of the discipline.
JT – So, if we accept your point that we are still working out the ground rules and very much in the infancy of image analysis and artificial intelligence applied to image analysis; I wonder what your perspective on the future is as a computational biologist? Does the algorithm get ever more sophisticated and ultimately push the pathologist to the side-lines? What is the future role of the pathologist going to look like once we have tamed the biological heterogeneity through computer science? Does he or she become confined to signing off computer generated diagnoses?
GR – It remains to be seen. I think it is certainly within the realm of possibility that something like this could happen. It could be that in the future, we have image analysis algorithms which are surprisingly good. This could happen and I couldn’t rule it out right now. The question is will we get there and if so, how? My view is that if we are going to get there or at least have the best chance of getting there, we are going to have to address the fundamental questions. All these questions that we have been discussing. What do you measure? How do you measure it from pixel data? Do you have the right laboratory techniques combined with the right imaging method? Then, in the future we will see highly useful digital aids for pathologists who will be able to click a button to achieve an accurate diagnosis.
You also said that image analysis is in its infancy and here you have to remember that this is not a new science. There has already been a lot of time and investment. If you look at neural networks, for example, in some respects they were invented 50 years ago. The first cell imaging which attempted to characterize the differences between different types of cells, dates back to the 1950s. At that time, they were significantly hampered by the lack of imaging capabilities and computational power and that is the aspect that has changed significantly in the last 20 or 30 years. So, from my perspective, we are in the infancy of our ability to tackle these problems, but we have been looking at machine learning and at image analysis for a very long time.
JT – My feeling is that establishing the fundamentals to tackle biological heterogeneity, will also take a very long time. When does this happen? If we look at the near future, say in 20 years’ time. What does that look like? Does the pathologist have a small batch of image analysis tools to assist with screening processes or can we contemplate fully automated analysis?
GR – Well, let’s take a look at what people have been trying to do in the last 20 years. There are companies that have been trying to automate cervical screening for example. This is a heavy-duty, high volume workload for a lot of technicians and pathologists to cope with. Some companies have tried to make systems for cytology specifically to alleviate and improve this situation and they have built systems for automatically handling the images. They look at specific cells to try to extract information on the nuclei and the cytoplasm and to measure the nuclear to cytoplasmic ratio. Then they produce some basic statistics. The focus is on removing the tedious and mundane tasks so that the pathologist can use his or her time more efficiently. Have they been successful? Yes, and no. Of course, pathologists are buying the systems and they are having some productivity gains. Is it 2x, 3x? it depends on who you talk to, and the results are actually quite mixed.
So, in the future it’s a tough call. Can we build systems that take this productivity, that is, the success rate of digital image analysis to a much higher level? I think it is possible, you can never underestimate the ingenuity of human beings when faced with critical needs. it’s certainly within the realm of possibility. We have already learned how to do engineering and its very rare that we make a situation worse. So, year by year, we are at least getting incremental improvements. That part you can count on. The question is, are we going to see an exceptionally large gain in that productivity. I believe we might, but I think if we do, it will probably come from us putting together fundamental concepts and then building upon those concepts. To the extent that we try to short cut and go directly from the image, taking the fundamentals out, going directly to predictive models, I think you will see a lot of flashes in the pan. In five to ten years we may even be talking about a different methodology which prevents us from being able to build on top of each other’s work, So the question you ask is very tough to answer, I cannot make a prediction.
JT – So you are suggesting that we should be going back to basics and for example, getting sample preparation standardized. Then we get together, and all agree that this is a standard preparation and that this is the method by which we achieve that standard preparation. Of course, that may require the use of a highly standardized automated staining system which is itself computer controlled and not validated for every possible nuance of biological heterogeneity, but at least it generates an input standard. Even in this situation however you believe we need to deal with lower-level variability to develop truly automated analyzes?
GR – Yes, the basic biology. The questions are: what is different and what is practical to image? What is a practical biomarker that you can more easily image? If you have that, then now you have a laboratory process which is standardized is easy to reproduce. How do you image in a standardized way? Which image analysis algorithm is robust enough? How do you extract the right features? How can you model mathematically the biological heterogeneity? What are the right statistics to use? All of those things put together, I think, will be necessary for us to get there.
There are also parallels that you can draw from other disciplines including radiology. Basic projection radiology has been around for a long time but computerised tomography, MRI and so on, those techniques have been digital from the beginning. Since the early 80s people have been pursuing automated diagnosis of basic conditions, illnesses, trauma etc, but even now, if you ask how much these automated techniques are really able to suggest that this person has a certain type of malignancy, you will get mixed answers. So, there is room for skepticism and there is room for optimism. Sometimes I see myself landing in the middle. It could be that digital pathology follows the same path and 20 years down the road, we are having a similar conversation. That could well be the future, but taking the optimistic view, even if that does happen, there will certainly be improvements along the way in workflow, communications, storage etc. We will see advances. Sometimes its just hard to predict where exactly they will be.
JT – You used the term ‘flash in the pan’ and I’d like to clarify if by that you are referring to specific point solutions for very defined environments?
GR – Or things that seem like solutions at the moment but then following further investigations, a few years down the road, we find fundamental flaws and people give up on that particular solution. That is the more profound consequence of a flash in the pan. I’m not saying this will or does happen but it’s certainly a possibility.
JT – I’m very interested by the possibility of mass screening by computer systems because there is clearly a shortage of pathologists at the current time. Pathologists are getting older and young people are not coming into the science. If you think of a discipline such as prostate analysis, then an algorithm for pre-screening (with the caveat of course of having to deal with the false negatives) would be very valuable, wouldn’t it?
GR – Yes of course and there is another aspect of image analysis that we sometimes neglect to talk about. The aids to our analysis that not only facilitate the analysis itself but also facilitate teaching and learning. That is the ability to retrieve similar images and similar cases and to put a bunch of data in front of a group of pathology trainees. It is true that there are big innovations that have already started appearing and which are going to make a difference. Image analysis can play a big role here in retrieving the right images and comparable data sets, in doing some calculations that are safe and reliable. These types of application already exist. There is no doubt that exciting things are going to happen in the future, but it’s hard to predict which ones will be truly useful and have longevity. People in our field, the machine learners, they are also going to come up with innovations that will last. Some will be flashes in the pan, and some will continue and be pervasive and persistent in the future. If you ask me however, the solutions that are based on first discovering the fundamentals and then exploiting that knowledge to put together the algorithms, are the ones that will succeed, and that is where I tend to focus my time.
JT – And then the FDA will say no! Regulation, that is the ultimate barrier isn’t it? Don’t you think that the most difficult long-term challenge to all these solutions will be achieving and maintaining regulatory approval.
GR – That is another reason to focus on fundamentals. If you can show not only when a solution works but when it fails. If you can explain when and why it fails. If you have all that understanding with you, then you can teach people when to use it, when to trust it, when not to trust it. If those answers come along with the solution, then I suspect that will create an easier pathway through the regulatory agencies.
JT – Professor Rohde, we’ll leave it there. Thank you for your time today.