GDAL Python Tutorial: Reading and Writing Raster Datasets
The Geospatial Data Abstraction Library (GDAL) is the standard for managing spatial data formats. GDAL has been incorporated into many different enterprise and open source GIS projects. Because GDAL is open source, it can be used by all. Learning to use GDAL with Python can help you automate workflows and implement custom raster processing solutions. This tutorial (in the notebook below) will walk you through the basics of reading raster datasets with GDAL and using GDAL to create new raster datasets.
Much more is possible with GDAL than is covered in this basic tutorial, including:
- Reprojecting and resizing rasters
- Updating raster band values and performing map algebra
- Clipping rasters to extents and shapes
- Performing terrain analysis, interpolation, and polygon conversion
All of these topics are covered in-depth with code and example datasets in my full-length course.
Install the GDAL Python Bindings¶
Before you can get started writing code, you’ll need to install the gdal
python bindings (aka the gdal
package or library). If you use the Windows operating system this may not be a simple task. I have developed a couple of tutorials to help you with gdal
installation using both Anaconda/Conda (recommended) and pip (a little more compicated).
Imports¶
To start, we’ll import gdal
, for raster operations, and numpy
, for array operations. Depending on your gdal
version, you may need to replace from osgeo import gdal
with import gdal
. I’ve also imported matplotlib
for raster band display.
from osgeo import gdal
import numpy as np
import matplotlib.pyplot as plt
Open a Raster¶
Let’s use gdal
to open a raster file. To follow this example you’ll need to have access to a raster file on a local drive. I’ve specified the path to my raster file with the fn
variable.
fn = "C:/qgis/raster/USGS_1_n44w115.tif"
Once the raster location is intialized, use gdal.Open
to open the raster as a GDAL dataset. Two arguments can be specified for gdal.Open
. The raster dataset path (required) and the access method (optional, one of gdal.GA_ReadOnly
(default) or gdal.GA_Update
).
ds = gdal.Open(fn)
Now, let’s check the type of the ds
variable. As you can see, it’s a GDAL Dataset.
print("'ds' type", type(ds))
'ds' type <class 'osgeo.gdal.Dataset'>
Raster Dataset Information and Properties¶
Once a raster dataset is opened, you can obtain information and metadata from the dataset to preform additional analysis. With gdal
we can access many raster properties, including, projection, size (columns and rows), the number of bands, and the geotransform (upper left corner, cell width, etc.). Let’s start by obtaining the projection, size, and band count for the raster.
print("Projection: ", ds.GetProjection()) # get projection
print("Columns:", ds.RasterXSize) # number of columns
print("Rows:", ds.RasterYSize) # number of rows
print("Band count:", ds.RasterCount) # number of bands
Projection: GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101004,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4269"]] Columns: 3612 Rows: 3612 Band count: 1
You’ll notice the raster projection is defined by a string that represents the EPSG 4269 projection parameters. The raster has 3612 rows and columns and one band.
Now let’s take a look at the raster’s affine transformation information, which is also referred to as the GeoTransform. To do this call the GetGeoTransfrom()
method of the GDAL dataset, as follows.
print("GeoTransform", ds.GetGeoTransform())
GeoTransform (-115.00166666668, 0.00027777778000000013, 0.0, 44.00166666668, 0.0, -0.00027777778000000013)
The gdal
GeoTransform is a tuple that contains six elements. Each GeoTransform element is expalined below in order of its pythonic index (0-5), that is, its position within the tuple.
- [0] left X coordinate
- [1] pixel width
- [2] row rotation (usually zero)
- [3] top Y coordinate
- [4] column rotation (usually zero)
- [5] pixel height, this will be negative for north up images
Data from the GeoTransform are used to determine the image extent, pixel centroids and corners, and pixel addresses that correspond to geographic coordinates.
Read Raster Data¶
You should now undertstand how to obtain basic metadata from a raster dataset with GDAL. The next step is accessing the raster data. With GDAL, raster data is accessed by raster bands. First, we’ll retrieve the raster band. Next we’ll retrieve the band data as a numpy
array. This can be done in a single line of code.
data_array = ds.GetRasterBand(1).ReadAsArray()
data_array.shape
(3612, 3612)
You’ll notice that the resulting array has the same number of rows and columns as the GDALDataset
(ds
), as it should. Also, notice that indexing for raster bands starts at 1, not at 0.
Display a Raster Band¶
With Python, you can easily display a raster band with the imshow
funciton of matplotlib
. I’ve imported matplotlib
at the beginning of this tutorial. Below I set the figure size, then use imshow
to plot data_array
, which contains the elevation data from the raster band. As you can see, the resulting figure shows the elevation data. You can check out the imshow documentation for more customization options.
plt.figure(figsize=(10, 10))
plt.imshow(data_array)
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x287f111b520>
Getting No Data Values¶
It is common for a raster band to have a designated value that represents a pixel with no data. Obtaining the no data value is important so that you can exclude those values from calculations you may perform on the data. Let’s retreive and print the no data value for the elevation raster band.
ndv = data_array = ds.GetRasterBand(1).GetNoDataValue()
print('No data value:', ndv)
No data value: -999999.0
As you can see, the no data value for this raster band is -999999.0
Setting No Data Values¶
When creating a new raster, you may also need to set a no data value. This is done by calling SetNoDataValue
on the raster band and passing the no data value. I’ll demonstrate this in detail in a following section when we create a new raster.
Properly Close a Raster¶
Once you are finished with a raster, it’s important to close it properly. Otherwise, you may experience problems accessing the data. To close a raster dataset, set it equal to None
, like so.
ds = None
Create and Write Data to a Raster¶
Now that you’ve learned how to use gdal
to read data as a numpy
array, I’ll demonstrate how create new raster and write to a new raster dataset.
There are two primary ways to create a new raster with gdal
. You can copy an existing raster dataset or create a new one from scratch. I’ll demonstrate both methods here.
With gdal
, all raster creation is done with a driver. So the first step to create a raster to instatiate a gdal
driver. With a driver, we’re basically letting gdal
know which file format we want to write so gdal
can manage the file properly. You can find a full list of the gdal
drivers in the documentation.
For these examples we’ll use the GeoTiff (GTiff
) driver.
Start by instantiating a GeoTiff driver as follows. Below, I print out the object type for demonstration purposes.
driver_gtiff = gdal.GetDriverByName('GTiff')
print(type(driver_gtiff))
<class 'osgeo.gdal.Driver'>
Now that we have a driver we’re ready to start writing GeoTiff files.
Copy an Exisiting Raster Dataset¶
The easiest way to create a new raster with gdal
is to use a driver to copy an existing raster dataset. This is accomplished with the driver’s CreateCopy
method. When using CreateCopy
we just need to pass an existing raster dataset and an output file name. Below, I demonstrate how to copy the intial dataset we read in (ds
)
# reopen the dataset because we closed it above
ds = gdal.Open(fn)
fn_copy = "C:/qgis/raster/gdal_copy.tif"
ds_copy = driver_gtiff.CreateCopy(fn_copy, ds)
Running the code above will create a new raster at the specified file path. It will be identical to the DEM we opened at the beginning of this exercise.
Using CreateCopy
is an excellent method when you want to create a new raster with the same extent, resolution, and data type as an existing raster.
Write Data to the Copied Raster¶
Once you’ve copied the existing raster you can then overwrite the data for any band. Let’s go through an example.
In this case we may want to apply an exaggeration to the elevation data (data_array
). We’ll subtract the minimum elevation from all the pixels, then multiply by 1.5 to create an exaggeration. Finally, we’ll write the array to the raster we just created (ds_copy
).
data_array = ds.GetRasterBand(1).ReadAsArray() # read band data from the existing raster
data_nan = np.where(data_array == ndv, np.nan, data_array) # set all the no data values to np.nan so we can easily calculate the minimum elevation
data_min = np.min(data_nan) # get the minimum elevation value (excluding nan)
data_stretch = np.where(data_array == ndv, ndv, (data_array - data_min) * 1.5) # now apply the strech algorithm
ds_copy.GetRasterBand(1).WriteArray(data_stretch) # write the calculated values to the raster
ds_copy = None # properly close the raster
Now let’s check our work by reopening the the dataset, reading the raster band data, and plotting it.
ds_copy = gdal.Open(fn_copy)
data_stretch = ds_copy.GetRasterBand(1).ReadAsArray()
plt.figure(figsize=(10, 10))
plt.imshow(data_stretch)
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x28783205fd0>