|

Open Rasters in Python without Downloading

There is so much raster data hosted on the web. We have access to nearly endless imagery and elevation datasets. While these data are exceptionally rich and useful, they can produce a problem of their own.

The downsides of raster data

Data requires storage. And storage gets messy. It’s inevitable that we’re going to download multiple copies of the same dataset and store them in different locations. If only there was a way we could just read the data from the source and not have to manage those data on our local machines, networks, or servers.

Benefits of eliminating raster downloads with Python

Well, there is a way. With Python and rasterio we can read many raster datasets directly from their web-hosted location into our machine’s RAM, and never have to download the data to our hard drive. This allows us to use the data and perform analysis with the downloading step.

An added benefit is that since the raster data are tied to a URL we can easily pass our analysis scripts to other users. We don’t have to pass the data and they don’t have to download it. Let me show you how easy this is to do.

How to read raster data in Python without downloading

First, you’ll need to have rasterio installed. I recommend using Mamba or Conda to manage your packages and installing rasterio from the conda-forge channel. GDAL is a dependency of rasterio and if you’re using Windows, you can run into installation hangups if you try to install with pip.

import rasterio

Find a raster URL

The trickiest part of reading raster data from the web can be finding the raster URL. I’m going to give you one example of a place to go, and you can find others for yourself.

In the United States, the USGS make most of their elevation data available through the National Map applications or the USGS Lidar Explorer. Search for either of those applications. Once you are inside of the application, search for the data you’re interested in.

Your search will return a list of downloadable objects.

You can get the download URL by right-clicking on the download link. Make sure the link is to a geographic file type and not a compressed folder or webpage.

Once you have the link add it to your Python script.

url = "https://prd-tnm.s3.amazonaws.com/StagedProducts/Elevation/1m/Projects/UT_FEMA_FS_FlamingGorge_2020_B20/TIFF/USGS_1M_12_x44y465_UT_FEMA_FS_FlamingGorge_2020_B20.tif"

Open a raster from a URL with rasterio

Now that you have a link to a raster file, the rest is easy. Just use rasterio.open to open the file as you would any other raster.

The raster will be read from its source on the web.

Note that opening the raster may take longer than if the file were saved to your hard drive. This is especially true for larger rasters on slow internet connections.

dem = rasterio.open(url)

Where the real work happens

So far we’ve created a rasterio dataset from a URL. Doing this doesn’t actually read any data, so the operation happens quite quickly. The place this will slow us down is when we read the data.

For example, like this.

band1 = dem.read(1)

Depending on your internet connection and the size of the raster, this will likely take a few minutes.

The following line of code takes about 5 minutes to generate a plot of the raster band with my internet connection using the URL above. You’ll notice that this produces a single-color plot because we haven’t masked out the no data values

from rasterio.plot import show

band1 = dem.read(1)
show(band1)

We can mask the no data values like so, to produce an image that is actually useful.

import numpy as np

msk = np.where(band1 == dem.nodata, 1, 0)
masked = np.ma.array(band1, mask=msk)
show(masked)

How will you use online rasters?

Hopefully, this little trick will help keep you organized and save you time and storage space.

This is how I’m presenting data in all of my online teaching. It’s just so clean! I don’t have to maintain download directories and update data.

As long as students setup Python environments correctly they’re able to access the same data I’m using.

The biggest issue that can arise is data links changing. There’s not much that can be done about this and it’s a risk I think is worth taking (at least in my applications).

Similar Posts