Animation by Justin Tran

Working Smarter

How Cambridge scientists use machine learning to improve climate modeling


Published on June 09, 2023

Filed under

In our new Working Smarter series, we hear from AI experts about how they’re leveraging machine learning to solve interesting problems and dramatically change the way we work for the better. 

When I first heard about the supergroup of scientists at The Institute of Computing for Climate Science who are banding together to fight the climate crisis, I couldn’t help but picture The Avengers joining forces against Thanos.

It’s easy to start ruminating on worst-case scenarios when you think about climate change. But it gives me hope to hear about the engineers, scientists, and physicists at the University of Cambridge collaborating on ideas that could bring about best-case scenarios in an uncertain time. 

The ICCS team is dedicated to improving climate forecasting models and using that information to change policy. It’s an effort comprised of brilliant minds with skills so exceptionally diverse, Cambridge Zero director and ICCS leader Emily Shuckburgh considers it a “radical collaboration.”

We spoke to one of her collaborators, research software engineer Jack Atkinson, to learn how he and the ICCS team are using machine learning to improve computational modeling as they pursue a solution to the climate crisis.

What first sparked your interest in climate science?  
I've always been interested in geography and the natural world. As a kid, I would do lots of hiking, walking, mountain stuff, and outdoor activities. When I went to university, I liked science [and] maths, but I didn't know exactly what I wanted to do. I was studying engineering. There happened to be one lecturer in the department who did a lot of work based on geoscience. His particular interest was the internals of the earth—how magma and the core generate the dynamo. But he'd also done some work with atmosphere stuff. It was through speaking to him that I realized I could pursue the science I found interesting: fluid dynamics.

Did your love of the natural world inspire you to pursue a job that would work to protect it? 
Yes, I'd been interested in geoscience and climate science because of that, but I'd say it's something that, as I've been working, has grown to be more of a driver. In my PhD work, there was some of that in the background, but it was driven by what I was interested in—basic maths, basic science. Part of my reason for working at ICCS was to move towards where some of this science is actually being applied with the CMIP (Coupled Model Intercomparison Project) data sets. 

CMIP is a global project that brings together many scientists and models of the climate to evaluate and improve our modeling and understanding by comparing different models to one another and to real world data. The results from CMIP are one of the key factors that contribute to the IPCC (Intergovernmental Panel on Climate Change).

My expertise is still on the science side, but it's the science that’s slightly closer to driving those things to inform policy.

A recent ICCS video discusses the three dimensions of scaling impact in science. Could you describe how machine learning is helping with computational scaling?
Computational scaling is the fact that computers are always getting more and more powerful. This allows us to add more and more detail to our simulations and run at higher resolutions with more detail. The famous example is Moore’s Law.

You can imagine that we have these big, private models. They do calculations of all sorts—for the flow of air, the flow of the oceans, the evaporation of water, the formation of clouds, the rain that comes out of those clouds. But it turns out, some of this can get very expensive in terms of computational resources. 

One of the bigger drivers now is to say, can we leverage machine learning to actually make these computations a lot more simple? If we've got something we've constructed from strong underlying physics, can we then map out that parameter space and train a neural net so we can run things within the model much faster? [Editor’s note: A neural network is a computer system trained to process data the way a human brain does.]

Whenever we build these models, we put in as much physics and understanding as we can, but it's always possible to put in more attributes. The other thing people are looking at is, rather than building the best physical models, can we take real world observational data and construct models that do a better job?

Is the idea that you could build better models for climate forecasting?
The idea is that we could be capturing more information in the data than we understand. For climate forecasting we try to capture as much as we can—for example, global temperatures, sea ice, changes to clouds and precipitation, changes to the atmospheric composition/chemistry, etc.

Perhaps the bigger picture is to capture how the large scale processes in the atmosphere and ocean might change with a changing climate: the jet stream, el Niño, the ocean ‘conveyor belt’, and antarctic current.

“Rather than building the best physical models, can we take real world observational data and construct models that do a better job?”

I think the difficulty with these is the nature of machine learning. Even if we can get better performance, we don't know why we're getting that better performance. What I would like is to see more people using machine learning methods to achieve better forecasts, but then using that to look back and say: What are we missing from the physical models? How can we improve our physical understanding to better match reality?

The physics pairing is extremely important as well, because a lot of machine learning applications are end-to-end machine learning. Say you want to count cells in an image. You can train a neural net, but you give it an image and you get out a number of cells. Or if you want to learn to spot cancer, you can take some medical data and get an estimation of the likelihood of cancer or not. Whereas here, we're not trying to replace the entirety of a simulation or the entirety of a forecast. We're still keeping those traditional, large computational models, but we're just trying to replace a little bit—an expensive bit—with machine learning. 

The difficulty is, you need to make sure this unit you're replacing stays physically consistent with all the other physics around it. That's one of the things I've done a lot of work on. I think there's gonna be a lot developed in this area to inform how to efficiently make use of these machine learning advantages.

Emily Schuckburgh, the director at Cambridge Zero, mentioned a kind of multidisciplinary “radical collaboration” happening there. What does collaborative teamwork look like at ICCS?
Within the group, we sit as a central computing, numerics, and software team. Then we collaborate with a number of groups who are part of the Schmidt Futures-funded Virtual Earth Systems Research Institute (VESRI), which has funded a number of multi-institution climate modeling projects around the world. For example, one is looking at sea ice and how that cross scales from individual melt ponds all the way up, how it influences the global climate system.

Which tools do you use to share your findings when you’re collaborating with colleagues? 
It's reached this point where people can be a bit more distributed. I work in Cambridge two or three days a week, but I have colleagues who are in Glasgow or the Isle of Mull or down in Bristol. So a lot of our team meetings are video or [hybrid].

Within the software side of things, we tend to do a lot of work via Github. One of the big things we’re doing at Schmidt Futures is driving open source software, and what's often called FAIR software—Findable, Accessible, Interoperable, Reusable. If you can have at least a record of technical discussions on Github issues or pull requests, it means that anyone can go and see the development process that's gone into something. We [also] make a lot of use of shared documents on Overleaf—which is for processing LaTeX documents—Google documents and Dropbox.

How does your team use Dropbox?
Some of our project collaborators will use it if they want to share data or a presentation or some work they're doing. They'll put it on Dropbox and we can then access it easier than trying to share it by email.

“Sharing code is the real way for progress. Having that knowledge be shared would help speed up progress everywhere AI is being applied.”

Since your team members have varied fields of expertise, do you use AI to summarize content in those shared documents so it takes less time to interpret each other’s findings?
Before the large language models were around, a few academic journals, particularly The American Geosciences Union, started doing something where an academic paper will always start off with a summary or an abstract, which gives you the paper in a nutshell. In the last two or three years, what they’ve been pushing for is the same abstract but with a simple language summary as well, translating the technical scientific writing into something that could be understood by journalists or the lay person. 

I think that's important with communicating science. So much science is publicly funded, but people having access, that has been very difficult. Within the UK, there's been a push for open access. In America within the last year, the government policy actually changed so that now people have a right to government-funded research, so things have to be published open access. 

I think driving the ability for people to actually understand science and present it in a simplified way is really important, especially with things like the environment and climate change. I think there’s a good option for perhaps using AI to summarize in a non-specialist manner. The other thing is, science now is so multidisciplinary that being able to communicate even to other scientists without using domain-specific language—or present things in a way that other people can understand—is also important.

What kind of tasks do you wish machine learning could take off your plate?
I would like help with automating simple tasks. Within technology or code, often you have a lot of what we call boilerplate stuff. Designing websites is an example. You can end up spending a lot of time messing around trying to get things to line up perfectly, adding all these attributes, setting colors, setting shapes. If you're able to provide a description of what you want, or sketch out something, feed it through and get something back, that would be very useful. 

Do you see AI playing a role in developing languages that could support scientific process? 
The data we have from both observations and modeling now is so vast. I think there's a real opportunity to use AI in processing this data and drawing out relationships from this data. The AI might not be able to tell us why certain things are linked, but if it can spot links and correlations in something a human just couldn't begin to process, you can then feed that back and have that human interaction to explore this. 

If you've got AI models that can take data and build better performance than any physics or science-based models, we don't want to just stop there. We want to say, why is that the case? What are we missing? How can we make our understanding better? I think the use of AI with big data and data science, there's a lot to be done there. There’s quite a bit of optimism for that. 

A big concern, as we do more and more computationally intensive stuff, is processing data and making it accessible. Just because someone did experiments to produce some data, there's no longer an excuse to just provide a number or a picture or a plot at the end. We're at the state now where that data should be properly archived [and] accessible to others, especially if it's a publicly funded or altruistic work. The more that can be driven within science as a whole, the better. 

Sharing data and sharing code is the real way for progress. It's not holding things to yourself, being private with what you have. Within medical research, obviously there are bits of data [that’s] essential to keep private, but so much of it is closed off and difficult to build on. I suspect there's a lot of places that have AI with far bigger capabilities than we see in research papers. Having that knowledge be shared would help speed up progress everywhere AI is being applied.

This interview has been edited and condensed.