Q&A with Environmental Data Science Editor: Jakob Runge

Jakob Runge heads the Climate Informatics working group at the German Aerospace Center’s Institute of Data Science in Jena since 2017 and is guest professor of computer science at TU Berlin since 2021. His group combines innovative data science methods from different fields (graphical models, causal inference, nonlinear dynamics, deep learning) and closely works with experts in the climate sciences and beyond. He is part of the Editorial team of Environmental Data Science, a new open access journal from Cambridge University Press.
Use-Inspired Basic Research: Developing Causality Methods for Climate Science
Can you tell us a bit about your background?
I’m a physicist by training and came from the complexity sciences. I got the idea to work on the climate when I was looking for a thesis topic and wanted to do something interesting from a complexity science perspective. I went to the Potsdam Institute for Climate Impact Research and there I found this nice connection between problems that were intellectually stimulating and that also, down the line, could really be relevant for climate science. That really caught me, this connection, and from then on I started to work at the interface of complexity and climate science.
Tell me more about your experience in interdisciplinary research at the Potsdam Institute.
The Potsdam Institute had one department on transdisciplinary concepts and methods, and the others were on climate impact research, atmospheric dynamics, these kinds of topics. This was really inspiring because I would talk to the other PhD students, and they would work more on the climate side and I would work on the methods side. It was an immensely stimulating place to work. The overarching frame was “use-inspired basic research” coming from Pasteur’s quadrant – I really like that phrase. It has kind of shaped my personal mission and agenda. In my case, I spell it out as developing methods that tackle challenges arising in climate research.
How did you then get involved in causality and climate science?
When I came to complexity science and started working in networks, I started to think more and more: What do these links in the network mean? At that time the networks were just correlation-based, so all links were just correlations between different locations on earth – these networks were called climate networks. I more and more began to think: Okay, but we want to have some causal interpretation. Otherwise, why do we draw these networks, if they are not causally interpretable and we have to be suspicious that every link is just a correlation? This made me look more into causal inference, which I didn’t know much about at the time. And that really shaped me, because then I had a kind of a mission to bring causal science to climate science and figure out how to adapt these methods to the challenges of the climate system and climate data.
This is the theme of use inspired basic research coming up again. The methods we develop now in my group are quite general; it’s not just for climate that they’re applicable. You build a research software tool and then you get so much feedback from all kinds of people applying it in many different fields. That really inspires me to improve, so that people use the tools and I get to learn about all these interesting areas.
Do you have examples of other fields of science where the tools you’ve built have been useful?
My advisor, Jürgen Kurths, has always had this idea of parallelism between the earth and the human brain: they’re both complex networks in essence. So, we’ve always looked at applications in neuroscience. Also, space physics and health and physiological data. But you have to work with someone from that field – it’s always more beneficial for both sides if you talk together and don’t just use their data. And, of course, climate science is where I’ve really collaborated heavily.
When you choose a research question, how much is driven by the science application and how much is driven by the methods?
It’s often that you start developing a method, like these causal methods, and initially you get terrible results on climate data. And then you start building toy models that mimic the properties of climate data, but where you know the ground truth and can handle every feature of the model. In climate sciences, autocorrelation is a big problem; the data are non-i.i.d. Trying to overcome this led to the PCMCI method that I’ve developed and that works quite well on such data. That method was inspired by this autocorrelation feature of the climate time series and helped to overcome this challenge. But when you talk to applied people, they have all of the methodological challenges: non-stationarity, missing data, nonlinearity, etc. I spend a lot of time translating climate questions into methodological challenges and then working on them. We have a sort of wish-list of these challenges that we enumerate in a 2019 perspectives paper in Nature Communications.
What are some other big open questions in causal inference and climate informatics?
Well, one big topic is trying to use causal inference to improve future projections of climate change. Climate models are our source of future projections and they’re not perfect. We need smart tools that leverage these future simulations to reduce the uncertainties of projections. You could just try to use a deep learning model, but you want to trust it, so you need more explainable AI research to have climate scientists and physicists trust such predictions and causal inference is at the core of explainable or interpretable AI. It’s fascinating how questions of linking climate model experiments with observations and including expert knowledge can be framed intuitively in terms of causal inference.
Do you publish a lot of open-source code for the methods you write? Should there be more open-source data available relating to the climate sciences?
Yeah, I have one software package called Tigramite on Github and this is where I always update methods and put new features. I wrote a lot of tutorials that people can use to engage, so this has been one of the main drivers of people using the methods. Also, CauseMe is a benchmark platform for causal discovery and one of the benchmarks comes from climate. We had a challenge at NeurIPS last year on climate-related challenges. For deep learning, there are more and more challenges arising; there’s now more climate scientists involved in deep learning, and they come up with big benchmarks on forecasting or now-casting of weather. Also, there’s a climate informatics workshop with a hackathon that happens every year and also exposes climate benchmarks.
Does the Environmental Data Science journal fit a publishing niche for your group in an exciting way?
I remember one of my papers that more than once got rejected from interdisciplinary journals because they would say “Where’s the application? This is just a method”. I would write cover letters to say “Hey, we need a journal where you can write about a new method that has broad applicability and that is presented in a way that is broadly accessible, but where you don’t necessarily have to come up with the new scientific discovery on some specific question. Where’s the space for that?” EDS is the journal for the type of research I do, at the intersection of methods and application. While it’s still an early journal, it’s going to grow. I think it will really be a journal where people go to look for new inspiring methodologies to apply to their research.
Environmental Data Science (cambridge.org/eds) is open for submissions. For more information, see the Call for Papers and the Instructions for Authors.
1-D0 the journal accepts research in the area of electromagnetics in buildings?.
2- How much time for first and final acceptance?
3- How much are the Fees?
Thanks for your interest:
1. EDS encourages submissions that use data-driven methods (such as: artificial intelligence, machine learning, statistics, data mining, computer vision, econometrics and other data science approaches) that enhance understanding of the environment, including climate change.
Read more here: https://www.cambridge.org/core/journals/environmental-data-science/information/instructions-for-authors#scope
2. We’re aiming for a decisions in 6-8 weeks of submission if the article goes to peer review, but we cannot guarantee this.
3. The article processing charge (APC) for publishing open access is 1740 GBP / $2435, but an increasing number of authors are covered by institutional agreements (Read & Publish). Authors in many developing countries are exempt courtesy of the Research4Life programme. We also have a discretionary waiver process that enables unfunded authors to publish.
Read more here:
https://www.cambridge.org/core/journals/environmental-data-science/information/instructions-for-authors#open-access