New User? Sign Up Free
SCCM is performing maintenance on its websites. For the best browsing experience, please use Microsoft Edge or Safari. Those using Chrome or Firefox may experience access issues at this time.
Unrecognized clinical deterioration during illness requiring hospitalization is associated with high risk of mortality and long-term morbidity among children. In this podcast hosted by Maureen A. Madden, DNP, RN, CPNP-AC, CCRN, FCCM, Anoop Mayampurath, PhD, discusses a novel machine learning model that identifies ICU transfers in hospitalized children more accurately than current tools. The discussion centers on the article “Development and External Validation of a Machine Learning Model for Prediction of Potential Transfer to the PICU,” published in the July 2022 issue of Pediatric Critical Care Medicine (Mayampurath A, et al. Pediatr Crit Care Med. 2022 23:514-523). Dr. Mayampurath is an assistant professor of biostatistics and medical informatics at the University of Wisconsin in Madison Wisconsin.
*If you are unable to play the podcast please click here to download the file.
Category: PCCM Podcast
Dr. Madden: Hello and welcome to the Society of Critical Care Medicine Podcast. I’m your host, Dr. Maureen Madden. Today, I’ll be speaking with Dr. Anoop Mayampurath, PhD, and we’ll be talking about “Development and External Validation of a Machine Learning Model for Prediction of Potential Transfer to the PICU,” published in Pediatric Critical Care Medicine this month, July 2022. To access the full article, visit pccmjournal.org. Dr. Mayampurath is an assistant professor of biostatistics and medical informatics at the University of Wisconsin in Madison, Wisconsin. Welcome, Dr. Mayampurath. Before we start, do you have any disclosures to report?
Dr. Mayampurath: No, I do not have any disclosures to report.
Dr. Madden: Very good. It’s my pleasure to have the opportunity to speak with you today. If you would not mind starting out by giving a little bit of background about yourself and how you became interested in machine learning.
Dr. Mayampurath: Sure. My PhD was in informatics and I largely concentrated on bioinformatics. I was developing algorithms for analyzing mass spectrometry data for proteomics applications. But when I joined University of Chicago, I started collaborating with Matthew Churpek, who’s the senior author on this paper. Together, we started working on a few clinical informatics projects. From bioinformatics, I made the transition to clinical informatics, having realized the potential of using electronic health record data and statistics to improve patient health. During our time together, we started going from more stats to prediction modeling and developing clinical prediction models. Dr. Churpek has a history of developing clinical prediction tools on the adult side. We started working together and, because I was in pediatrics, I started exploring opportunities for machine learning in pediatrics and landed upon my field in that sense.
Dr. Madden: I’m really appreciative that you have taken to heart the pediatric population to work on this, because that’s the background that I come from. Many things start in the adult population before they filter down to pediatrics. Sometimes it’s not always necessarily extrapolated in the correct way. So I’m really excited to see this article in print. But before we go into discussing the article a little bit, would you mind, in more plain language, trying to initiate what the conversation about machine learning and predictive models are?
Dr. Mayampurath: Sure. Machine learning largely is a field where you instill the ability of a computer to learn from data. You do this without giving explicit instructions on how to do this learning. On the back end of machine learning are several algorithms that intersect statistics and computer science. What these algorithms do is they look for patterns within the data, and you point the algorithm toward an outcome that you’re trying to predict, and the algorithm uses patterns of the data to solve the question of how to best predict that outcome.
There’s a little bit of engineering to it as well. Once you develop the model, you have to engineer the model so that it can operate without bias, without what we call overfitting, so you can apply it to a variety of patients and a variety of populations. What’s different about clinical predictions and clinical machine learning is that, not only do we have the ability to impact patients directly, but the whole idea of machine learning views are different because the data that we use have to be curated very carefully. Things like model accuracy, which is so much more popular in computer science machine learning, is of relatively lesser importance to us. What we look for are things like positive predictive value and negative predictive value because the predictions that come out of the model are used to impact clinical care, so it is only useful if people actually use it, right?
The second thing that’s different is we have to have the ability to trust a clinical prediction model for the same reason that people need to use it, to actually see the benefit to the patient. Thirdly, we have to have the ability to generalize to external settings, patients outside of your cohort. We have to do a deep analysis. Especially in pediatrics, we have to do an analysis by age group or stratified by patient type and so on.
Dr. Madden: Okay. I know in the article that you’re the lead author on, you talked about pCART, which is a gradient-based machine model. Can you tell me a little bit about that?
Dr. Mayampurath: Absolutely. The very motivation for this work is basically we want to develop a new early warning score that will tell clinicians and care personnel in the hospital which kid is at risk for experiencing a deterioration event in the future. Why we set down this path was, when we looked at early warning scores on the pediatric world, we saw how there was a dominant one called the Pediatric Early Warning Score, called PEWS. There are different flavors of it. There’s the bedside PEWS, which is the simpler seven-item scoring scale. There’s Monaghan PEWS and Brighton PEWS, all of which are variations of the same concept. We saw that these are limited because of a couple of reasons. One is, since they’re item-based scoring schemes, they had these cutoffs based on age groups, right? So if you’re a four-year-old and your heart rate was this much, then you are given a score of two, if it is between a certain upper bound and lower bound of heart rate. You total all those scores up across systems, heart rate, respiratory rate, oxygen saturation, and so on, then you get assigned a particular score. Highest score means you are more likely to deteriorate.
There were two problems with that. One is these cutoffs that are developed are usually developed from a limited population and probably not generalizable to a large extent. Secondly, these current schemes also had subjective components in them, such as the characteristic of the patient, defining them as irritable versus consolable. You would agree, with your background, that what is identified as irritable by one nurse could easily be recorded as consolable by another nurse, depending on their experience. So now you have these subjective variations in the scoring scheme itself that, even within the hospital, different people might score things differently and, across hospitals, you can see variations in scoring as well.
What we wanted to see was, if you remove the subjective part, just look at raw vital signs, without any subjective cutoffs, without any subjective scoring elements, can we do better than PEWS? One of our early publications, a couple of years ago, basically showed that this was possible. So then, the question was: All right, we just use vital signs, now can we improve the model further? Two venues that we decided to do was we wanted to incorporate labs, then we wanted to go from a statistical model to a machine learning model. So we went from a logistic regression to gradient-boosted machines. Finally, we wanted to validate it externally and out of the hospital, so that we could understand how it performs in a data set that it hasn’t seen before.
Dr. Madden: I totally can appreciate the subjective nature of a lot of the existing early warning scoring tools. It doesn’t necessarily take into account, one classic example is if a child is febrile and, based on age, you have to anticipate that there’s going to be an impact on their heart rate as well as their respiratory rate, but it doesn’t necessarily mean that it puts them into a deterioration of their clinical status, yet it puts you in a scenario where people are heightened to them and brings them to further attention or further evaluation. When you talk about bringing all of the additional lab values in, how does it account for the fact that those items that are part of the model actually haven’t been documented for an individual patient? I know when I read the article, it said the greatest items that seem to have risk of deterioration were FIO2, heart rate, and respiratory rate. How does that account for potentially not having all of the data points that you’re using in the model?
Dr. Mayampurath: Right. That’s essentially what we in the machine learning world call the missing value problem. What do you do when you have missing values in your observations? Traditionally, in clinical prediction modeling, one way of doing it is to carry forward the last recorded value. In other words, if your heart rate was measured at 8:00 a.m., temperature was measured at 12:00 p.m., and you don’t have a heart rate at 12:00 p.m., you carry forward that heart rate at 8:00 a.m. to the current time point, so you have the ability to make a prediction as well as frame the model. For observations that you can’t carry forward, for example, if lactate was not taken until 24 hours after a patient is admitted, those beginning few hours will not have any lactic values because there’s nothing to carry forward. What we do is we impute it using the median by location. All patients in a similar ward will basically get the same median imputed value for observations that can’t be carried forward.
Dr. Madden: When I was reading the article, I was thinking about how each institution has different arrangements in terms of what’s acceptable for an ICU environment and what’s acceptable for a ward environment. What was catching my attention, because you used FIO2 as a parameter, as part of this algorithm, I was thinking about the various oxygen delivery devices that we can use. Depending upon the specific institution, they may choose to use something like high flow only in the ICU setting, or they may use it in the ward setting. How do you account for the technology that may be in place already? Or is it really just looking at those data points in isolation?
Dr. Mayampurath: We consider the latter case, where basically we look at the observations of recorded fraction of inspired oxygen. The other devices that you mentioned, basically being high-flow nasal canula, we could consider them as interventions of their own, right? We could think about it as: The patient is already showing signs of deterioration, so they get put on high flow before they crash to the point where they need mechanical ventilation, for example. We could definitely use those devices as proxies for deterioration, but we actually want to capture the deterioration ahead in time. So at the point at which the high flow is ordered, it’s probably too late, right? In a certain sense, the model is not really useful in that sense of identifying children at risk for future deterioration. What we want to do is look at the FIO2 readings six hours before and try and predict whether or not they can get worse. That was our thinking in terms of looking at raw observations, rather than interventions, in our model.
Dr. Madden: Also then, you discussed the concept that it’s reducing the number of alerts in comparison right now to some of the other scorings that were a little bit more subjective in nature. You’re decreasing alarm fatigue, you’re decreasing false alarms, and you’re really honing in on the population that truly has the potential to be deteriorating. Correct? Did I understand that properly?
Dr. Mayampurath: Yes, that is correct. I think that’s one of the most important things in clinical prediction models: If you have a model that fires all the time, then people are not going to use it, and your model becomes essentially useless. On the other hand, if you fire too little, then people are not going to trust it, right? Or people are going to miss patients that are more at risk for deterioration. There’s this balance point that we have to look at where you try to maximize true positive rate or sensitivity of the model while trying to minimize the number of alerts. We have this analysis where we looked at the number of patients needed to alert, which is basically, how many patients do you need to be alerted on to get one positive outcome, one true label versus sensitivity? At levels of sensitivity that are clinically relevant, pCART had a lower number needed to alert than bedside PEWS, essentially saying that, at the same true positive rate, it will fire less, so you won’t miss any patients and you will have fewer alarms during a particular shift.
Dr. Madden: That’s amazing because that’s what people really want and to be able to focus their resources in the best possible manner. The reduction in false alarms and alarm fatigue, etc., is really an important part of this. It’s very exciting in that regard. I have another big concept question for you. This is more due to my lack of knowledge and understanding of this: How is this to be applied in most environments? What does it take to have this algorithm in place? How universally can this be employed? Tell me more about it.
Dr. Mayampurath: Sure. That’s a great question. Essentially, because we are using data from the electronic health record, we essentially need an EHR infrastructure in the hospital to implement our model. Our model has been implemented at the University of Chicago over the last year or so, since March 2021, to be precise, where it’s actually being used in real-time assessment of risk. Nurses on the floor have a screen that they can see where they can look at the pCART score, the scores get divided into zero to a hundred. There are cutoffs that we have set based on gray, yellow, and red, gray basically meaning that the patient is okay, yellow basically saying that the patient is showing signs of early derangement, and red basically showing that the patient is showing signs of severe derangement.
Every time a patient crosses each threshold, there’s a list of steps that the nurses have to go through, there’s a pathway that they have to go through, click on steps, and say that they did this. We call this compliance, and we measure compliance at those pathways at points at which the transition between patient states happens. On the back end, we use Epic at University of Chicago. Our model can interface directly with Epic pretty easily. Having said that, the other major EHR is Cerner, and I believe that our model is pretty transportable to Cerner as well. The model exists in isolation through basically a binary object. If you have the capability to interface with that, then you’re set. Assuming you have an electronic health record system that can talk to a very simple representation of the model, then you’ll be able to get predictions from it.
Dr. Madden: Okay, so it’s still reliant then upon populating the data or extracting the data that’s been inputted. You still have somewhat of a human component to it and possibly a lag associated with it, if I’m understanding it correctly.
Dr. Mayampurath: Yes, it is definitely sort of augmented intelligence, right? The first thing that you are asked to do is verify the recording. Every time a nurse takes vitals, pCART gives you a score. Every time there’s a new lab measurement, pCART gives you a score. The first step that they have to do is verify those recordings, that those observations are correct. When they verify, there’s a score that’s generated automatically, but that’s instantaneous, that’s a matter of milliseconds getting that particular probability. But it is dependent on data coming in through routine checks of vitals, which I believe in a standard hospital is about once every four hours or so. And labs would come in intermittently between those two times so, on average, you should probably get two or three readings of pCART score within four hours.
Dr. Madden: If all of these data points are inputted and everything and the change is being detected, is there a hard stop associated with it? You get yellow or you get red, because it’s detecting that there’s a clinical change in the patient for deterioration, can they ignore it? Do they have to acknowledge it? What happens?
Dr. Mayampurath: To be compliant with the system, basically they have to go through and click on a pathway that asks them to consider several options, depending on whether they went from gray to yellow or from yellow to red. Going from yellow to red is probably easier to explain. That transition is the most severe, at which point it gets escalated to the PICU team or the PICU response team in a particular hospital. The change from gray to yellow is accompanied by options where they can call the physician in charge of the patient. They can redo the vitals out of the two hours, so they can say that this is expected and what we’re going to do is check again more frequently as opposed to a four-hour checkup. Or they have the ability to order labs or they can screen for conditions like sepsis for the patient, where they ask them to think about why this is happening. They have to click on what their choice is and they have to enter free text that explains what they think is happening. All the data are being collected on the back end and that’s relevant data to us, because it tells us how the care personnel are interacting with the model and how we can make it better.
Dr. Madden: Since I work in the critical care unit, it’s intriguing to me, first of all, to try and have the opportunity to intervene and prevent individuals from needing a transfer to the ICU. But I want to take it a step further. What else do you see this machine learning being used for in critical care? What other aspects may it be used on? Can it be translated into the ICU and still have a predictive component for deterioration? It’s one of the things I’ve struggled with as I’ve been working within our system to try and create a tool for sepsis. Because for our population in the ICU, there are so many alarms and so many changes in their hemodynamics that may or may not be resulting in deterioration. It goes back to that alarm fatigue. If you could tell me your big dream about how we can expand this use, I’d love to hear it.
Dr. Mayampurath: Absolutely. What else could it be used used for? Right now, pCART is only used for ward patients. You have to be an admitted patient under the age of 18 in a ward, and it predicts the risk of going from a ward to the ICU within the next 12 hours. We have tested it on other outcomes. We tested it on the risk of going to the ICU and then being mechanically ventilated or put on vasopressors or dying within 12 hours of ICU admission. The performance is better than PEWS. It’s pretty high. But our primary outcome is still whether this patient runs the risk of going to the ICU from the ward within the next 12 hours. We are not really preventing ICU transfer. We’re saying there’s an early risk of ICU transfer. What they could do is maybe a transfer that happens earlier, depending on ICU conditions, of course.
Dr. Madden: Or potentially have interventions that are appropriate, that change the patient’s clinical status, who then doesn’t require the transfer. That’s how I was visualizing it as well.
Dr. Mayampurath: Yeah, it’s certainly possible. That is an option too. The other question that you had was: How do you translate this to ICU patients? I think, for ICU patients, the outcome becomes a little bit more delicate, right? Because no longer are we predicting ICU transfer, you have to think about what defines deterioration in the ICU, then you also have to think about what elements of data you have in the ICU, which is different from the ward. You have more frequent checkups, right? Vitals, you have waveform data that you can use from the monitor. The machine learning there tends to be separate from a ward population. The ICU population is more homogeneous in a certain sense. They have more data, they have a lot more interventions and things happening to them. In our current research we’ve started thinking about expanding pCART into the ICU, but we are still at the stage of trying to define what the outcome that we are trying to predict is, what data we need to use, how often do we need to fire this score, and so on and so forth.
Dr. Madden: Well, I really need you to come up with that, so I’m looking forward to additional work on this. Do you have any final words?
Dr. Mayampurath: Thank you so much for the opportunity. I’d love to discuss this more. Every time I talk to somebody on the medical side, my knowledge only gets improved. Thank you so much for chatting with me about this.
Dr. Madden: Well, thank you. You’ve certainly expanded my knowledge and hopefully the audiences’. This concludes another addition of the Society of Critical Care Medicine podcast. For the Society of Critical Care Medicine Podcast, I’m Dr. Madden.
Maureen A. Madden, DNP, RN, CPNP-AC, CCRN, FCCM, is a professor of pediatrics at Rutgers Robert Wood Johnson Medical School, and a pediatric critical care nurse practitioner in the pediatric intensive care unit at Bristol-Myers Squibb Children’s Hospital in New Brunswick, New Jersey.
Join or renew your membership with SCCM, the only multiprofessional society dedicated exclusively to the advancement of critical care. Contact a customer service representative at +1 847 827-6888 or visit sccm.org/membership for more information.
The SCCM Podcast is the copyrighted material of the Society of Critical Care Medicine, and all rights are reserved. Find more episodes at sccm.org/podcast.
This podcast is for educational purposes only. The material presented is intended to represent an approach, view, statement, or opinion of the presenter that may be helpful to others. The views and opinions expressed herein are those of the presenters and do not necessarily reflect the opinions or views of SCCM. SCCM does not recommend or endorse any specific test, physician, product, procedure, opinion, or other information that may be mentioned.
Some episodes of the SCCM Podcast include a transcript of the episode’s audio. Although the transcription is largely accurate, in some cases it is incomplete or inaccurate due to inaudible passages or transcription errors and should not be treated as an authoritative record.