Working Smarter
Episode 8: Sophia Wang on the new tools helping her find answers in old data
 
      On our final episode this season of Working Smarter we talk to Sophia Wang, an assistant professor of ophthalmology at Stanford University.
Wang leads the school’s ophthalmic informatics and artificial intelligence group, which uses the latest machine learning techniques to analyze electronic health records. In practice, that means looking at disparate sources of data—from doctors’ notes and eye exam data to diagnostic imagery and billing codes—and finding the sorts of patterns that can be difficult for humans to spot.
Hear Wang talk about using AI to extract useful information from a sea of unstructured data, and how to make better decisions with the data you already have—which, in Wang’s case, means improving outcomes for glaucoma patients and providing a better quality of care.
Show notes:
- Learn more about Sophia and her research
- Visit Stanford University’s Ophthalmic Informatics and Artificial Intelligence Group
~ ~ ~
 
Working Smarter is a new podcast from Dropbox about how AI is changing the way we work and get stuff done.
You can listen to more episodes of Working Smarter on Apple Podcasts, Spotify, YouTube Music, Amazon Music, or wherever you get your podcasts. To read more stories and past interviews, visit workingsmarter.ai
This show would not be possible without the talented team at Cosmic Standard, namely: our producers Samiah Adams and Aja Simpson, technical director Jacob Winik, and executive producer Eliza Smith. Special thanks to Benjy Baptiste for production assistance, our marketing and PR consultant Meggan Ellingboe, and our illustrators, Fanny Luor and Justin Tran. Our theme song was created by Doug Stuart. Working Smarter is hosted by Matthew Braga. 
 
 Thanks for listening!
Full episode transcript
This might sound strange, but I love a good container. You know, the idea of having everything arranged all neat and tidy and stored in one place. And in case you think this is a passing fad, let me assure you: I’ve been like this my whole life. 
At the age of eight I had “Matt’s Camping Folder”—like, a literal paper folder with “classified” scrawled across the front that contained all of the information our family would need for our annual summer camping trips. Road maps and park guides and to-do lists galore.
When I was in high school, I begged my parents to go to IKEA to look at all the desks and shelves. Very cool!
And now, in my old age, I’ve embraced the bin: big plastic containers for towels, shoes, holiday decorations, patio tiles, and—full circle—camping gear. I stack them in my closets, my storage locker, and I keep a running inventory of everything inside, so I always know where something is.
Contrast that with my digital life where it feels like chaos is the name of the game.
I’ve given up on my inbox. I barely organize my files. And it’s not like any of it matters anyway, because most of my work is happening in apps and tabs—each with their own file systems and search filters and approaches to organization. It’s a mess!
So naturally, I’ve been intrigued by people who are using AI to find a signal in all that noise. The people who are figuring out how to take all their digital containers, and neatly organize them in one big AI-powered box.
I’m your host Matthew Braga, and on today’s episode of Working Smarter we’re talking to Sophia Wang. She’s an assistant professor of ophthalmology at Stanford University, where she leads the ophthalmic informatics and artificial intelligence group.
Sophia and her colleagues also have their containers—electronic health records, doctors’ notes, diagnostic imagery and more. But they think artificial intelligence can help bring some order to that chaos, and help them uncover new insights that might not have been as obvious had they just gone rummaging around on their own.
What does that mean for Sophia, her patients, and how healthcare professionals do their jobs? That’s coming up next on this episode of Working Smarter.
~ ~ ~
Sophia, thank you so much for joining us today.
Thank you so much for having me. 
 
I've been looking forward to this. So you're an assistant professor of ophthalmology at Stanford University where you lead the ophthalmic informatics and artificial intelligence group. Can you tell me more about that group and the work you do there?
That's our research group. We are very interested in building predictive models for various ophthalmic outcomes. So, practically speaking, that means we use all different kinds of data sources related to ophthalmology to predict how our patients will do in the future—whether our glaucoma patients' vision will get worse or other kinds of outcomes. 
 
Got it. You mentioned glaucoma a second ago. Just briefly, for folks who might not be familiar, what is that?
Glaucoma is a progressive disease of the optic nerve. And the optic nerve is kind of like the cable that connects your eye to your brain. It transmits all the signal about what you're seeing to your brain for further processing so that you are conscious of what you're seeing. And of course, if that nerve is diseased then it's not transmitting the signal properly and you will experience that as having blind spots in your vision. So in glaucoma, that nerve becomes diseased and you get characteristic blind spots in your vision that then become disabling.
 
Okay. So today, how are doctors making glaucoma-related diagnoses and decisions? I'm thinking, if I go for an appointment or an exam, what are the sources of data or the things that ophthalmologists are using as part of that evaluation or part of that process?
The very first pass is just the general ophthalmic exam. Of course, we check things like your vision. We look at the optic nerve using our special lamps and magnifying glasses. We can actually see it with our own eyes. And if it looks a certain way, perhaps a little bit suspicious—the structure of it maybe looks a little different, or perhaps it's asymmetric between your right and left eyes—that kind of raises one of the first flags of, oh, maybe this optic nerve ought to be evaluated with more in-detail testing for glaucoma.
We also check for your eye pressure, which is one of those factors that is highly related to glaucoma. And then once you raise that flag, there's a number of additional tests that we can do. For example, we can do visual field testing, which is a formal test for those blind spots, because our brain is really good at ignoring blind spots or filling them in or masking them in. Sometimes in early disease we don't really notice that until we do formal testing for it. 
 
And so all of this information—like the test data, the stuff that you're observing—where does all of that go?
A lot of our information flows, in general, to what we call electronic health records. And that's kind of an umbrella term for all the health records, all the health data, that is generated during your encounters with physicians and the healthcare system. Within that umbrella term, there's lots of different kinds of pots of different data. I might put your visual acuity or what letter you see on the eye chart in one spot and your eye pressure in one spot. But then the pictures that we take of your optic nerve might go in a different system, which is still electronic health records but, you know, kind of lives in its own area so to speak.
 
And then I imagine on top of that you have doctor's notes and then other kinds of information.
That's right, we take detailed notes, doctor's notes. Some people like to dictate them. Some people type them out. Sometimes it's very templated language, and sometimes it's free text, essentially.
 
Are you the sort of person who can read your own handwriting back or do you sometimes encounter something and, like, it seems inscrutable in hindsight?
[Laughs.] Well, most of us are not keeping handwritten records anymore, thank goodness.
 
So is it fair to say that—it doesn't sound like you're lacking for data, but it's more about, you know, how do we effectively analyze all this information and make sense of all the data we have?
That's right. We have so much data that we're capturing just on a routine basis. Everyone who walks through the door gets all this eye exam data that's captured—all the pictures, all the testing, all the notes. But sometimes it's very hard to extract the specific piece of information that we want from this sea of data. So that's certainly a challenge.
 
And if I may ask, why is it so difficult to pull all that information out?
Some of that information is unstructured, so to speak. If your optic nerve looks a certain way, or we observe a particular finding, sometimes we just type that out in words in our notes. And doctors will use different kinds of words depending on their preference. And so, if you're looking for that finding, sometimes it's very hard to actually identify that on a large scale from, let's say, thousands of notes for thousands of patients.
We do have structured ways of storing data. There's always the billing codes or procedure codes that tell insurance companies what diseases we treated or what was done that day. But they were never meant for research purposes. They were really meant to communicate with insurance companies. So the data might not always be as accurate or as detailed as we might want in research.
 
So you have all of this data and you want to figure out how you can analyze it and make sense of it. When did AI become part of that process for you?
Through my career I've stepped up or graduated through data sets that have become increasingly complex and messy and unstructured. And I'll give you an example of this. There's data that we collect specifically for research—survey data and things like that—that someone thinks up ahead of time exactly what they want to capture and in what format and so forth. And actually, our Center for Disease Control (CDC) runs a number of these sorts of national surveys. So the data that comes out of those national surveys are very, very clean, and very, very structured. But as you get into more of the routinely generated healthcare data that's available—for example, insurance databases—they become more complex and more difficult to understand.
As electronic health records became more and more widely adopted over the last 10 years or so, we were getting increasing amounts of even more complex unstructured data, like those free text notes, like those imaging databases. And my initial interest was to be able to understand details about the patient from these unstructured data sets. Because, you know, as a doctor, we're always typing into the notes and we're taking all these very detailed observations down. And I thought, well, wouldn't it be great if we could do research on a large scale using that data, which is no longer in handwritten notes, but now all typed and collected in electronic health records. So my initial angle for the artificial intelligence part was through natural language processing of our clinical free text notes.
 
You've co-authored a number of papers now over the last couple years looking at ways of using machine learning—stuff like deep learning, you mentioned natural language processing, LLMs, computer vision—in the field of ophthalmology. What have been some of the most promising areas for applying AI in your work that you've explored so far?
I have been mainly focused on predicting which patients with glaucoma will progress—meaning their disease will get worse. And this is a really important topic, because once your glaucoma disease progresses and your optic nerve becomes more and more sick, because it's nerve tissue you can't really reverse it. You can't regenerate it, at least not right now. So, it's really important to be able to figure out which patients are going to get worse and lose their vision in the future. Because if we knew that before they actually got worse, then maybe we could change the way we treat those patients—being more proactive and maybe give them more medications earlier, or do procedures earlier.
But we do know that glaucoma is a progressive disease. Not everyone will progress. And some will remain stable with less invasive treatments early on. So that's the application or use case that I've been focusing on in my research. And Glaucoma is very interesting because, as I mentioned, there are many different ways that we evaluate the disease. There's all the measurements we take of the eye, the vision, the pressure. There are the pictures that we take. There are the visual field tests looking for the blind spots. All these different kinds of data come together to tell us about the state of the disease. And so it's been my interest to see if we can build models which integrate multiple different kinds of data, and therefore integrate multiple different techniques—for example, natural language processing and computer vision to look at both the notes and the photos, for example—to better predict the future outcomes.
 
Why is the progression of the disease so difficult to predict with traditional methods or past analyses that you've used?
Well, it's always hard to predict the future, I think. [Laughs.]
  
[Laughs.] Yeah, that's fair.
I mean, even if you ask a doctor, a glaucoma specialist, “Who do you think is going to get worse? Who do you think are going to be those fast progressors?”—we have some idea. We know of some risk factors. But a lot of the factors that we think about are also not factors that are easily captured in those kinds of structured code data—so things like, are the patients taking their medications regularly, or maybe what kinds of medications are being used, or do they have a family history of glaucoma.
 
Interesting. And in terms of the actual process—like, the way in which you are applying ML—are there particular bottlenecks or sources of friction in your job and the way that you're analyzing information or trying to make decisions that ML is uniquely suited to helping with?
Well a lot of the bottlenecks for developing a system like this have to do with actually assembling the data ahead of time. All this data sort of lives in its own areas within the electronic health records. So trying to put together or match up the photos, the visual field tests, the scans and all the notes and the measurements for a patient or many patients—that's very hard to integrate and to put it all together in a research-ready database.
Another challenge of this particular kind of data is its longitudinal nature. It's not like determining whether a picture is of a cat or a dog, for example, where you can just look at one picture and you can decide as a human, right? Sometimes the evaluation of glaucoma patients takes place over a number of visits. You have to see evolution of the disease to make a real determination. Sometimes the time interval between visits is different for different patients, and putting that all together into a machine learning model—which, in some ways, can be very inflexible as to the structure of the data that goes into it—can be a challenge too.
 
Interesting. Is there also an opportunity there? And I guess what I mean by that is, I imagine for humans looking at data over a really long time scale it can sometimes be hard to spot those patterns. But my understanding is that's actually one of the things that machine learning can be really good at—being able to pick those patterns out that we miss. Is that fair to say about this particular work?
Yes, I think that is a really great insight, and that is definitely one of the things we're hoping that machine learning can help us achieve. Oftentimes as clinicians, we might see a new patient and they might come with years of records from their prior glaucoma specialist. You know, they just moved into the area, and we're looking through all of this as we're evaluating the patient. But it's a lot of data for even a human clinician to process. So if we had a really effective machine learning algorithm that can help with that, that would certainly be really helpful.
 
And just for listeners who might not know, you're talking about years of data here. What is the typical age of a glaucoma patient?
Right, so it’s an age-related disease. Typically our glaucoma patients are older—60s or 70s. So if someone was diagnosed with glaucoma in their 60s but now they're 75, then you've got 15 years of potential glaucoma records to look through for their entire history.
 
Very interesting. So with this work that you've been doing so far of applying AI, and machine learning specifically, to ophthalmology, what's surprised you most so far?
Some of the structured information in the electronic health records is very good at predicting outcomes even without the supplemental, let's say, unstructured data—clinical free text or imaging. And we found this out because those kinds of structured electronic health records data are basically the easiest to assemble first. So oftentimes when we're starting off in research we can pull that data more quickly. And when we started building our models predicting glaucoma outcomes the performance was better than we expected for predicting the future using really just basic eye exam data—demographics, billing codes—for their diagnoses.
And I thought that we would need a lot more information coming from imaging and clinical notes to make these models perform as well as they did. But actually, when we added those additional modalities of data, yes, we got better performance, but the baseline performance using just the structured data is pretty good.
 
It sounds like it's not just about having more data necessarily but also starting with the right data. Would you say that's fair?
Yes. It's not just having more data, but it's the right data, and you have to be a little bit thoughtful about how you organize that data even before it goes into the model. You could put in more data, but it might be just more noise and not more signal so to speak.
 
You were talking about the quality of the predictions a second ago, and I think I read in one of your papers that one of the algorithms you were working on was able to predict patient outcomes with, like, 80 percent accuracy—which seems quite high, quite impressive. But how does that compare to human performance? How do we put this into context?
This is a great question. It's always important to put these kinds of performance metrics in human context. And I'll start with a contrasting example, which is, again, classification of photographs. We know that cats versus dogs in classification of photographs, a human can pretty much always do this. So you're really aiming for, like, 100 percent accuracy for any kind of machine task. And similarly, for certain ophthalmology tasks like looking at a picture of a retina and determining whether there might be diabetic eye disease in it, that's something that a human looking at a photo can classify very accurately. So all those algorithms have to be super high performing.
But a more complex task like figuring out if a patient will progress in the future—that's not looking at something and classifying it. That's predicting the future. And so, as I mentioned, even humans are not very good at this. And in order to demonstrate that for one of my studies, I just went through a small group of patients, maybe 300 patients or something, and I just looked at their records. And I tried to predict as a glaucoma specialist, oh, who's going to need glaucoma surgery and who's not going to need glaucoma surgery. I like to think I'm a pretty good doctor, but it turns out that my guesses were not good at all. [Laughs] I think if you had flipped the direction of my guesses, it might even have been like a better performing future. So, you know, it doesn't take much, actually, to do better than a glaucoma specialist.
 
Well, and that's so interesting to me because, nevertheless, I think when you bring technology into a lot of contexts—especially health-related contexts—people might have questions or they might have have some concerns or things they want to be reassured about. How do you build trust in these kinds of tools and techniques? Both for doctors, but also for patients?
There are many reservations about the actual use of AI in healthcare. Almost always, when we try to publish one of our recent papers and, and say, “Oh, look, this algorithm does quite well compared to humans at predicting outcomes,” there are a lot of questions that come up. What are the features, what are the inputs that these algorithms are looking at? Are we looking at reasonable inputs, or are they just detecting noise, basically? So almost always we have to investigate our algorithms for what are the important features that they're relying on and make sure that those are in line with our clinical reasoning, so to speak.
Other questions that are increasingly coming up now are how do these algorithms perform in subsets of patients? Either demographic subgroups or subgroups of patients with particular diseases or particular features—like, patients who have already had surgeries or already had other procedures. And I think it's very important to do all of these analyses before you get to the point where you are advocating that they immediately be implemented.
 
That makes sense. For doctors and clinicians, what does success look like when you're using these tools? Is it time saved? Is it the ability to help more patients? Make better decisions? How would you characterize that?
I think for a tool that is a clinical decision support tool we want to see that it's helping our patients. That it is really, for example, improving vision or improving surgical outcomes. If we built an algorithm that predicted how well patients would do after different kinds of surgeries—which is another area of research—we would want our surgeons who use this algorithm to ultimately have better surgical outcomes because they've chosen the appropriate surgery for the appropriate patients. So those are, I think, clinical benchmarks for success. I think in order to prove that an AI algorithm really is helping patients we eventually will need to do more traditional studies—like randomized controlled trials of patients who were treated with the help of an algorithm versus patients who were not—and really look at those outcomes to see that it's helping patients.
 
How far off would you say those are?
In our field, not that close, I would say. 
 
Why is that?
I think there's a lot of work to be done in evaluating fairness of our algorithms before we deploy them—in finding out whether algorithms that I've trained using Stanford data are going to work anywhere else, on patients that are not at Stanford and things like that. I think to get to the point where we can test an algorithm in a large randomized control trial—which takes a lot of resources to design and execute a study like that—we have to have all these kinks worked out, all these all these evaluations done, before we get to that point. But that's the goal.
 
So with those goals in mind, what are you working on right now?
We've done a lot of predictive modeling for glaucoma patients in our Stanford patients. But the next step is to go beyond one site and access multi-center data—training bigger models, testing whether models trained in one place will work in another place. So one of the things that I'm working on now is training these glaucoma outcome models using data from SOURCE, which is short for Site Outcomes Research Collaborative. It is a multi-center collaborative made up of 17 academic centers that see eye atients, and we are all contributing our electronic health records on our eye patients to this consortium or registry. It's run out of the University of Michigan right now by the chief data officer Dr. Joshua Stein, an ophthalmologist and informaticist who is harmonizing all of this data and putting it together and bringing on more sites. It's 17 now, but there's many more in the process of joining. I think this kind of initiative is very exciting because we can now do a lot more of the studies that we couldn't do before.
 
Before you had used the word supportive when you were describing these tools, and I like that because I know that at Dropbox we often talk about AI as being something assistive, augmentative—something that helps you do your job and doesn't do your job for you. How have you been thinking about this framing when you've been working on some of the research that you've been doing?
We are trying to build these algorithms to support physicians in their treatment of patients—to provide some additional data for them to look at and decide, “Okay, well, you know, this algorithm suggests that this patient might be a high risk patient.” And like many of our other data inputs, eventually it may become one that we decide to pay attention to if the algorithm is very good, right? But ultimately I think it's up to the user or the physician to decide how to use that AI-generated advice. Because as the human physician that is treating the human patient in front of us, we actually know a lot more about that patient than maybe can be captured even with the astounding breadth and depth of our electronic health records. An algorithm might suggest that a patient could do well with a certain kind of surgery—but I might know that patient isn't capable of doing the complex regimen of treatment that's required to make that surgery a success, for example. I'll have all these factors in addition to whatever might be going into the algorithm, to help further personalize the treatments to that patient. 
 
That makes sense. I mean, empathy seems like something that only doctors can bring to the table.
Sure. Empathy and explaining in a way that the patient can understand where they are and what we're trying to do for them—and presenting the choices to the patients, too. Ultimately, it's up to them what their medical care is going to look like.
 
What about for patients as well? You sort of alluded to this a little bit before, but could AI-powered tools and some of the algorithms you're working on help improve access to care or the quality of care especially among underserved populations?
That is definitely a goal. We hope that AI will improve care for underserved populations. We don't want to deploy systems that actually worsen care for underserved populations or widen the disparities that we know already exist in glaucoma care and in ophthalmic care. If we are able to build algorithms that can be deployed to, for example, detect glaucoma earlier rather than waiting till it's symptomatic and advanced for those patients to present to us, that's one way that it can really expand access to care. 
 
More broadly speaking, in the field of medicine, how does the work and the research that you've been doing with AI compare to what other people are doing in other medical fields? I'm curious about where this fits into the range of AI applications or possibilities that people are currently exploring.
I think ophthalmology as a field has been one of the specialties in medicine that has been leading artificial intelligence applications. And a lot of that early work, and current work, is in computer vision, because we are such a heavily visual—no pun intended [laughs]—field. We take so many pictures and scans of the eye, which is sort of a natural substrate for computer vision. We actually have an autonomous system that can detect diabetic retinopathy. And I will just caveat this by saying that I am not an expert in this system per se, but it's one of the few FDA approved machines that can take a picture of your retina and determine whether there is diabetic retinopathy that needs to be referred to a higher level of ophthalmic care. There's not a lot of autonomous AI medical tools out there that are really in use or FDA approved. So I would say that, in this area, we are certainly leading the way.
 
What about in terms of lower level tools and techniques? Like, on the research level, are there things that you've seen in other fields that you think could help you improve the level of care you can provide or how you do your work?
One of the areas that I would like to bring more to ophthalmology actually is the fair AI evaluation and techniques. This has been very important in the computer science field in general. It has come to health care and AI as well, and it's only really just starting to be an issue that we as ophthalmologists are also thinking about for our AI algorithms. That involves looking at how our algorithms are performing in different groups of populations and making sure that we wouldn’t be harming groups of people if we were to deploy our AI algorithms, This is something that I've been very inspired by and want to bring more awareness to in our ophthalmology field.
 
That's a good segue because I wanted to ask, what what are you looking forward to the most? What upcoming advancements in AI that you've been tracking do you think are going to have the most impact on how you do your work?
Of course it's hard to talk about what we're looking forward to most, or what, what we're most excited about, without talking about the latest generation of large language models, right? When I wanted to do natural language processing initially as a trainee—this is probably 2015 or something like that, I started thinking about this—we had BERT models. These were kind of the previous generation of natural language processing type of models, which were orders of magnitude smaller than the current ones that have now entered the public discourse, let's say, or become available. So when we were doing things like trying to see if we could extract certain eye exam components from clinical free text notes, we were using all of those kind of smaller models, and they were doing okay but not perfect performance.
But now, with the advent of these modern large language models, there is so much excitement about what more information we might be able to glean from our free text clinical notes that could be helpful for finding the right patients for study or just understanding different features of the patients. So yeah, that's certainly a huge area of excitement. 
 
This is a bit of a bigger picture question, but why does this work matter to you?
We spend so much of our time generating these electronic health records. I think that there's so much insight about disease processes and outcomes that can be gleaned from all of this data that is just being generated routinely every day. And I think that untapped potential really excites me and motivates me to find ways to use that information to help make better decisions for our patients.
 
Sophia, this has been really fascinating. Thank you so much for joining us.
Thank you for having me, and all very good discussion questions. This has been really fun.
 
I appreciate it.
  
~ ~ ~
 
The thing about my bins and boxes and closet organizers is that they’re finite. They can only fit so much. Whereas online, it’s the seeming limitlessness of our digital lives that makes them so alluring—but also so chaotic at the same time.
When I work, I try my best to leave breadcrumbs for myself. Little reminders of what’s stored where, hoping it’s stuff I’ll eventually need—that one, elusive insight, or the missing ingredient that turns a troublesome project around. But I know I’m just swimming against the current. No inventory can help me here. The data’s flowing too fast to keep up.
Short of collectively deciding to return to the era of the floppy disk and live like it’s 1986—a great decade for music, mind you—it seems unlikely that we’ll ever be able to throw our arms around all of our apps and tabs and far-flung files ever again.
And so you’re going to see a lot more people like Sophia, who are thinking of new ways, new AI-powered tools, that make better use of all the valuable information that’s hidden away in our endless digital containers, out of sight, out of mind. That makes all those insights as easy to grab as a box of camping gear on a shelf.
If you want to learn more about Sophia and her work, you can find some links in the show notes.
Working Smarter is brought to you by Dropbox. We make AI-powered tools that help knowledge workers get things done no matter how or where they work.
You can listen to more episodes on Apple Podcasts, YouTube Music, Spotify, or wherever you get your podcasts. And you can also find more interviews on our website, workingsmarter.ai
This show would not be possible without the talented team at Cosmic Standard:
Our producers Samiah Adams and Aja Simpson, our technical director Jacob Winik, and our executive producer Eliza Smith.
At Dropbox, special thanks to Benjy Baptiste for production assistance and our illustrators Fanny Luor and Justin Tran.
Our theme song was created by Doug Stuart.
And I’m your host, Matthew Braga. Thanks for listening.
~ ~ ~
  
This transcript has been lightly edited for clarity.