Geospatial Data Demystified: Satellites, AI, and Earth’s Hidden Data
A conversation with geospatial expert Yohan Iddawela
This week, my guest was Yohan Iddawela. Yohan is a geospatial data scientist at the Asian Development Bank and previously worked for the World Bank. He has a PhD in economic geography from the London School of Economics. In this episode, we talked about all things related to geospatial analysis, including fascinating use cases for geospatial data, the integral role of satellites, how AI and machine learning are helping improve geospatial data quality, and a grab bag of other geospatial topics.
If you enjoyed this conversation, be sure to check out Yohan’s newsletter. It's called Spatial Edge, and you can find it on Substack. It covers all the latest innovations in geospatial analysis.
A condensed transcript is provided below.
Yohan Idowela, welcome to the podcast.
Thanks, James. Great to be here.
I wanted to get started by giving listeners a taste of the magic of geospatial data. It's quite fascinating, and as I've been following you on Twitter and reading your newsletter, it struck me that there's a huge variety of uses for geospatial data. I made a short list here just to give listeners a taste: predicting drinking water shortages, detecting marine litter from space, detecting illegal mining, estimating the volume of shipping traffic and maritime trade, estimating the height of buildings and their function within a city, identifying where informal settlements are located, and calculating subnational GDP. That list could go on for a while. I was wondering if you could take your favorite example from that list, or maybe another example, and give an overview of the problem in that space and how geospatial data is helping fill the gap.
Sure, happy to. It sounds like you've really done your research into the geospatial space. Maybe we can focus on the last topic you raised, subnational GDP or GDP in general. I think the best place to start is how official GDP data is actually calculated. For most countries, you rely on surveys—consumer confidence, business surveys, trade data, agricultural data—and then distill them through models to estimate GDP. Most countries produce national-level GDP once a quarter, but with a bit of a lag. For example, quarterly data for the first quarter of the year might be delayed by a month or two. There are two key issues here: the methodologies vary between countries, and there's a delay in calculating national-level GDP. For subnational GDP, such as state or municipality level in the US or in Europe, it’s often calculated once a year, with a year-and-a-half lag. Data for 2024, for example, might only be released in July 2026.
What we’ve seen in the geospatial space is the use of various datasets, with one of the most popular being nighttime luminosity—nighttime satellite images showing the brightness of a place at night, used as a proxy for economic activity. The great thing about this is that it’s a consistent methodology for the entire world. Unlike survey-based methods, the lag with luminosity data is small—it’s available the next day. So these are the two main constraints geospatial data can overcome. Also, a lot of countries don’t have granular GDP estimates, but with luminosity data, you can look at very small areas within a country and assess economic activity there. These are the kinds of use cases that geospatial data unlocks.
I think the starkest example of this, which listeners might have seen, is the contrast between North and South Korea. It’s often used to demonstrate the power of capitalism—whether you’re a fan of capitalism or not, that’s the message. North Korea is completely dark while South Korea is brightly lit. But I think what you’re saying is that even when the contrast is less extreme, you can still use this methodology at a more granular level to do more interesting calculations than just pointing out that North Korea is not industrialized.
Yeah, exactly. We even looked at this during the pandemic. You had localized lockdowns, particularly in Europe, where local governments would decide to shut down the economy. We used nightlights to measure the local economic impact of those lockdowns. In the UK, for example, central London had a significant reduction in luminosity because people weren’t commuting. But in the suburbs, where people lived, you actually saw an increase in luminosity as people spent more time at home, turned on lights, and so on.
That's interesting. When we think about the European example you gave, where it takes a year and a half to publish official results, is the issue that it’s just very costly and time-consuming to do these surveys and collect the data? Is that why traditional survey approaches are less frequent—because they’re expensive and time-consuming, and nightlights provide a faster, less expensive alternative?
Yeah, for sure. I’m not entirely sure of all the reasons, but I know that for national-level calculations, you require representative samples, which are much easier to gather than creating 500 different samples at a subnational level. That’s likely more time-consuming and resource-intensive.
Yeah, it must get quite complicated with the subnational estimates. That’s where nightlights really shine. Another thing you posted on Twitter—I know you said the methodology for nightlights is consistent, but I saw from your Twitter account that due to the growing use of LEDs, the wavelengths of light are changing. This is causing some difficulty with the traditional nightlights dataset because different lights give off different wavelengths, so they’re detected at different magnitudes, potentially skewing the data. Can you explain that a bit more?
Yeah, you got it. This highlights a broader issue with nightlights. It's a great proxy, but it has limitations. One issue is the angle of the satellite. If an image is taken directly head-on (nadir), it will have a different luminosity value than an image taken from an angle (off-nadir). Time of day also matters—capturing an image at 7 p.m. versus 4 a.m. affects luminosity, not because of economic activity but because of the time. The LED example is another limitation. LED lights are on the blue spectrum, which can’t be fully captured by traditional nightlight satellites. As cities transition from phosphorescent to LED lights, the satellite may view that as a reduction in luminosity, even though it’s just not capturing the full wavelength. For example, Milan transitioned to LED lights, and the satellite showed a decrease in luminosity, but official GDP statistics from 2014-2016 showed an increase. So, it’s important to be aware of these limitations.
I guess that’s a challenge not just in Milan but in other parts of the world too. Are there any technological innovations or new datasets to counteract these limitations with nightlight data?
I’m not aware of any solutions that have fully tackled the LED issue. Once you’re aware that LED transitions are happening, you can account for that in the model. But detecting where these transitions are happening is difficult. It probably involves knowing the region well, reading news reports, and so on.
I wanted to continue talking about satellites, because they’re fascinating. You've written a lot about satellite resolution, and I think listeners would find it interesting too. One thing you wrote in your newsletter is about the cost of very high-resolution images—50 cm resolution can cost about $7 per square kilometer. For a country the size of the UK, one day of data would cost about $1.7 million. Taking those kinds of images every day worldwide would cost billions. So, how often are high-resolution images being taken, and what are they used for? Are nightlights using this high resolution?
With those numbers, I should clarify that’s related to purchasing existing satellite data, called archival data. You can go into the archives of companies that run satellites and buy data they've already collected. For 50 cm resolution, it costs about $7 per square kilometer. Most users need time series data, so they want data monthly or daily, which adds to the cost. Tasking a satellite—asking it to capture bespoke images—costs even more, around $15 per square kilometer. As for how often these high-resolution images are taken, it depends. Companies like Planet have daily data for every location in the world at 3 meters of resolution, but I’m not sure how often higher resolution images are taken. It varies by provider and resolution.
What are the different resolutions people use, and what are the use cases? For example, when would you need 50 cm resolution versus 1 km?
The most commonly used satellite data is often freely available. Two frequently used datasets are Sentinel-2, with a 10-meter resolution, and Landsat, with a 30-meter resolution. They’re used for land classification—determining if an area is built-up, forest, cropland, or water. If you want to identify crops or measure road quality, you’ll need higher resolution data. For roughness of road surfaces, for instance, freely available data is too low-resolution.
What does "resolution" mean, exactly? If I have 1 km resolution versus 50 cm, does it mean I can’t see anything smaller than 1 km in an image?
Good question. When we talk about resolution, we mean ground surface distance. Every image is made up of pixels, and each pixel represents a certain area on the Earth's surface. A 1 km resolution means each pixel in the image represents 1 km by 1 km on the ground. A 1-meter resolution means each pixel represents 1 meter by 1 meter. The lower the resolution, the more detailed the image.
Who is using this satellite data? Is it mainly governments, researchers, or companies? And what about nightlights data?
There are many users. Traditionally, commercial satellite providers' main clients have been governments, often for national security reasons, such as defense. Satellite imagery is also used for things like agricultural statistics, deforestation tracking, and cropland analysis. Academics and environmental scientists also use satellite imagery for research purposes.
As for nightlights specifically, governments don’t use them much because they have other resources for creating official statistics. Nightlights data can be too noisy for official use. However, it's widely used by researchers, think tanks, and development organizations like the World Bank, UN, and the Asian Development Bank. It's also growing in popularity in the economics space as a proxy for economic activity. In finance, nightlights are being used more as well. For example, there’s work being done on tracking “dark shipping.” Many ships have AIS (Automatic Identification System) devices that send GPS signals for tracking maritime trade. But with sanctions on oil and gas, some ships turn off their GPS to avoid detection. Satellite imagery is being used to find these dark ships and track illicit trade, which is useful for commodities markets, especially for futures trading in oil and gas.
Interesting. So they're using this data to make smarter investments, buy the right stocks, or invest in hedge funds or countries.
Exactly, especially in commodities and futures trading.
One more question about resolution: You mentioned earlier that AI is being used to upscale lower-resolution images to higher resolution. There are AI tools like Magnific that can take a low-quality image and make it look high-quality, but it can introduce artifacts. When you're dealing with satellite data, introducing artifacts isn't ideal. Can you give an overview of how upscaling works with satellite data?
Sure. The process you're talking about is called super-resolution, where we use AI to increase the resolution of freely available satellite images. This is especially useful for governments and organizations that can’t afford expensive high-resolution data.
For example, some governments use satellite images to detect buildings and compare them to the official land registry, updating it when new buildings are detected. Other governments use it to spot illegal construction on agricultural land. Since high-resolution data is costly, super-resolution offers a way to use low-resolution images and enhance them with AI techniques. However, super-resolution isn’t perfect and won’t match the accuracy of ground-truth data.
There are two main approaches to super-resolution: multi-image super-resolution and single-image super-resolution. Multi-image involves combining multiple images taken from different angles or at different times to enhance details. This method reduces the likelihood of artifacts. Single-image super-resolution, on the other hand, uses generative AI models, such as Generative Adversarial Networks (GANs). These models use training data to "guess" the details that should be in an image, which can introduce artifacts or hallucinations. For geospatial data, we prefer the more conservative multi-image approach, but it doesn’t allow for as much resolution improvement—usually only by a factor of two to four.
So is super-resolution something that’s still in development, or has it already been deployed?
It’s already been deployed in some cases. I’m currently working on an open-source model for Asia, where we’re developing super-resolution techniques specific to this region. We started last month, and we’re hoping to see exciting results by the end of next year.
That’s exciting! Shifting gears a little, you’ve talked about accessibility issues with geospatial data. What are the current challenges, and how can we make it more accessible to a wider audience?
There are three main challenges to accessibility: technical literacy, cost, and awareness of available data.
First, technical literacy varies between users. For example, an economist might have some programming skills and can use Python libraries to analyze data. But a farmer, who just wants to know about crop health, probably doesn’t have the same skill set. The focus in the industry should be on the insights rather than the technical details, making it easier for people to use the data without needing advanced technical skills.
Second, cost is a big issue, as we’ve discussed. With more satellite companies entering the market, competition should drive down prices, making the data more affordable. There are also platforms that allow users to access satellite data with just a few clicks, which makes the process more user-friendly.
Finally, awareness is key. There’s so much data out there, but people don’t always know what exists or how to access it. That’s one of my goals with my work—to help people discover the available geospatial data.
Sounds like your newsletter could be a solution for that.
Thank you! It’s one small step toward that goal.
Are there specific innovations in the space that address these issues? You mentioned a company that’s like an "Uber for geospatial data." Can you talk more about that or other innovations?
Yeah, that company is called SkyFi. I’ve talked about them on several podcasts, so I’m hoping the check is in the mail! SkyFi was founded by Bill Perkins, who was using satellite data for commodities trading. He saw an opportunity to democratize access to satellite images, so he launched SkyFi, a platform that connects users to satellite providers. It’s essentially a marketplace for satellite data, where you can purchase imagery with just a few clicks. This makes it much easier to access insights without the complex procurement processes that were typical in the past. They’ve only been around for about four years, but they’re growing fast, and I’m very bullish on them.
That’s interesting. So the long-term hope is that more platforms like SkyFi will emerge, and then smaller users—like farmers, for instance—will be able to access crop health data directly without needing to analyze raw satellite images themselves. Is that the idea?
Exactly. And platforms like SkyFi are already offering not just raw satellite images but derivative products, such as insights on crop health, so that users don’t need to run complex analyses themselves.
That's exciting. We have a few minutes left, so I thought we could do a quick lightning round. I gathered some topics from your newsletter and Twitter, and maybe you can say a few words about each. Let’s keep it light. First up: the "Tropical Moist Forest" dataset.
I don’t know why they had to call it the Tropical Moist Forest dataset. "Rainforest" would’ve been fine! I’m sure there are experts who will say there’s a difference, but I don’t know—“moist” just doesn’t sit right with me.
Fair enough! What about the phrase "Classifying trees is cooler than Drake"?
Well, it’s just a joke to point out how much data there is on trees and vegetation in the geospatial space. Trees are everywhere—just like Drake!
What should people know about the poppy ban in Afghanistan?
The Taliban introduced a poppy ban in 2022 to gain international credibility. Before that, they financed their operations by selling poppy and opium. Now, satellite images are being used to measure the reduction in poppy crops across the country.
Interesting! Last one: "Spatial Autocorrelation."
That’s a big topic! But in short, it’s about how data points near each other tend to be correlated. For example, house prices in the same neighborhood are likely to be similar. When running regression models, this violates the assumption that all data points are independent. Spatial autocorrelation can skew results if it’s not accounted for, so special techniques are used to handle it.
Thanks for that! Last question before we wrap up: What are you most excited about in the geospatial data space, and what are you working on right now?
I’m really excited about super-resolution, which we talked about earlier. I’ve also been exploring 3D reconstructions using neural networks, which could be game-changing for things like disaster risk management. Imagine creating a 3D visualization of a neighborhood and showing government officials what a flood would look like, so they can prioritize where to build flood defenses. There are a lot of exciting possibilities with 3D digital twins and visualizations in the geospatial space.
That sounds amazing. Yohan Iddawela, thanks so much for being on the podcast!
It was a pleasure to chat about all things geospatial. Thanks for having me!