GDAL Python Tutorial: Reading and Writing Raster Datasets
The Geospatial Data Abstraction Library (GDAL) is the standard for managing spatial data formats. GDAL has been incorporated into many different enterprise and open source GIS projects. Because GDAL is open source, it can be used by all. Learning to use GDAL with Python can help you automate workflows and implement custom raster processing solutions. This tutorial (in the notebook below) will walk you through the basics of reading raster datasets with GDAL and using GDAL to create new raster datasets.
Much more is possible with GDAL than is covered in this basic tutorial, including:
- Reprojecting and resizing rasters
- Updating raster band values and performing map algebra
- Clipping rasters to extents and shapes
- Performing terrain analysis, interpolation, and polygon conversion
All of these topics are covered in-depth with code and example datasets in my full-length course.
Install the GDAL Python Bindings¶
Before you can get started writing code, you’ll need to install the gdal
python bindings (aka the gdal
package or library). If you use the Windows operating system this may not be a simple task. I have developed a couple of tutorials to help you with gdal
installation using both Anaconda/Conda (recommended) and pip (a little more compicated).
Imports¶
To start, we’ll import gdal
, for raster operations, and numpy
, for array operations. Depending on your gdal
version, you may need to replace from osgeo import gdal
with import gdal
. I’ve also imported matplotlib
for raster band display.
from osgeo import gdal
import numpy as np
import matplotlib.pyplot as plt
Open a Raster¶
Let’s use gdal
to open a raster file. To follow this example you’ll need to have access to a raster file on a local drive. I’ve specified the path to my raster file with the fn
variable.
fn = "C:/qgis/raster/USGS_1_n44w115.tif"
Once the raster location is intialized, use gdal.Open
to open the raster as a GDAL dataset. Two arguments can be specified for gdal.Open
. The raster dataset path (required) and the access method (optional, one of gdal.GA_ReadOnly
(default) or gdal.GA_Update
).
ds = gdal.Open(fn)
Now, let’s check the type of the ds
variable. As you can see, it’s a GDAL Dataset.
print("'ds' type", type(ds))
'ds' type <class 'osgeo.gdal.Dataset'>
Raster Dataset Information and Properties¶
Once a raster dataset is opened, you can obtain information and metadata from the dataset to preform additional analysis. With gdal
we can access many raster properties, including, projection, size (columns and rows), the number of bands, and the geotransform (upper left corner, cell width, etc.). Let’s start by obtaining the projection, size, and band count for the raster.
print("Projection: ", ds.GetProjection()) # get projection
print("Columns:", ds.RasterXSize) # number of columns
print("Rows:", ds.RasterYSize) # number of rows
print("Band count:", ds.RasterCount) # number of bands
Projection: GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101004,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4269"]] Columns: 3612 Rows: 3612 Band count: 1
You’ll notice the raster projection is defined by a string that represents the EPSG 4269 projection parameters. The raster has 3612 rows and columns and one band.
Now let’s take a look at the raster’s affine transformation information, which is also referred to as the GeoTransform. To do this call the GetGeoTransfrom()
method of the GDAL dataset, as follows.
print("GeoTransform", ds.GetGeoTransform())
GeoTransform (-115.00166666668, 0.00027777778000000013, 0.0, 44.00166666668, 0.0, -0.00027777778000000013)
The gdal
GeoTransform is a tuple that contains six elements. Each GeoTransform element is expalined below in order of its pythonic index (0-5), that is, its position within the tuple.
- [0] left X coordinate
- [1] pixel width
- [2] row rotation (usually zero)
- [3] top Y coordinate
- [4] column rotation (usually zero)
- [5] pixel height, this will be negative for north up images
Data from the GeoTransform are used to determine the image extent, pixel centroids and corners, and pixel addresses that correspond to geographic coordinates.
Read Raster Data¶
You should now undertstand how to obtain basic metadata from a raster dataset with GDAL. The next step is accessing the raster data. With GDAL, raster data is accessed by raster bands. First, we’ll retrieve the raster band. Next we’ll retrieve the band data as a numpy
array. This can be done in a single line of code.
data_array = ds.GetRasterBand(1).ReadAsArray()
data_array.shape
(3612, 3612)
You’ll notice that the resulting array has the same number of rows and columns as the GDALDataset
(ds
), as it should. Also, notice that indexing for raster bands starts at 1, not at 0.
Display a Raster Band¶
With Python, you can easily display a raster band with the imshow
funciton of matplotlib
. I’ve imported matplotlib
at the beginning of this tutorial. Below I set the figure size, then use imshow
to plot data_array
, which contains the elevation data from the raster band. As you can see, the resulting figure shows the elevation data. You can check out the imshow documentation for more customization options.
plt.figure(figsize=(10, 10))
plt.imshow(data_array)
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x287f111b520>
Getting No Data Values¶
It is common for a raster band to have a designated value that represents a pixel with no data. Obtaining the no data value is important so that you can exclude those values from calculations you may perform on the data. Let’s retreive and print the no data value for the elevation raster band.
ndv = data_array = ds.GetRasterBand(1).GetNoDataValue()
print('No data value:', ndv)
No data value: -999999.0
As you can see, the no data value for this raster band is -999999.0
Setting No Data Values¶
When creating a new raster, you may also need to set a no data value. This is done by calling SetNoDataValue
on the raster band and passing the no data value. I’ll demonstrate this in detail in a following section when we create a new raster.
Properly Close a Raster¶
Once you are finished with a raster, it’s important to close it properly. Otherwise, you may experience problems accessing the data. To close a raster dataset, set it equal to None
, like so.
ds = None
Create and Write Data to a Raster¶
Now that you’ve learned how to use gdal
to read data as a numpy
array, I’ll demonstrate how create new raster and write to a new raster dataset.
There are two primary ways to create a new raster with gdal
. You can copy an existing raster dataset or create a new one from scratch. I’ll demonstrate both methods here.
With gdal
, all raster creation is done with a driver. So the first step to create a raster to instatiate a gdal
driver. With a driver, we’re basically letting gdal
know which file format we want to write so gdal
can manage the file properly. You can find a full list of the gdal
drivers in the documentation.
For these examples we’ll use the GeoTiff (GTiff
) driver.
Start by instantiating a GeoTiff driver as follows. Below, I print out the object type for demonstration purposes.
driver_gtiff = gdal.GetDriverByName('GTiff')
print(type(driver_gtiff))
<class 'osgeo.gdal.Driver'>
Now that we have a driver we’re ready to start writing GeoTiff files.
Copy an Exisiting Raster Dataset¶
The easiest way to create a new raster with gdal
is to use a driver to copy an existing raster dataset. This is accomplished with the driver’s CreateCopy
method. When using CreateCopy
we just need to pass an existing raster dataset and an output file name. Below, I demonstrate how to copy the intial dataset we read in (ds
)
# reopen the dataset because we closed it above
ds = gdal.Open(fn)
fn_copy = "C:/qgis/raster/gdal_copy.tif"
ds_copy = driver_gtiff.CreateCopy(fn_copy, ds)
Running the code above will create a new raster at the specified file path. It will be identical to the DEM we opened at the beginning of this exercise.
Using CreateCopy
is an excellent method when you want to create a new raster with the same extent, resolution, and data type as an existing raster.
Write Data to the Copied Raster¶
Once you’ve copied the existing raster you can then overwrite the data for any band. Let’s go through an example.
In this case we may want to apply an exaggeration to the elevation data (data_array
). We’ll subtract the minimum elevation from all the pixels, then multiply by 1.5 to create an exaggeration. Finally, we’ll write the array to the raster we just created (ds_copy
).
data_array = ds.GetRasterBand(1).ReadAsArray() # read band data from the existing raster
data_nan = np.where(data_array == ndv, np.nan, data_array) # set all the no data values to np.nan so we can easily calculate the minimum elevation
data_min = np.min(data_nan) # get the minimum elevation value (excluding nan)
data_stretch = np.where(data_array == ndv, ndv, (data_array - data_min) * 1.5) # now apply the strech algorithm
ds_copy.GetRasterBand(1).WriteArray(data_stretch) # write the calculated values to the raster
ds_copy = None # properly close the raster
Now let’s check our work by reopening the the dataset, reading the raster band data, and plotting it.
ds_copy = gdal.Open(fn_copy)
data_stretch = ds_copy.GetRasterBand(1).ReadAsArray()
plt.figure(figsize=(10, 10))
plt.imshow(data_stretch)
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x28783205fd0>
While the image looks the same as the first one we created, notice that the color bar has a much different range. This indicates the raster values are different than what we started with. You have now successfully written values to a new raster!
Create a New Raster¶
In some instances you may need to create an entirely new raster. You may want to do this when the new raster covers a different extent, has a different spatial resolution, has a different data type, or has a different number of bands than an existing raster. Creating a new raster is more involved than copying an existing raster, but can still be accomplished quickly.
We will use the Create
method of a gdal
driver to create a new raster dataset. We need to pass a file path, number of rows (ysize
), number of columns (xsize
), number of bands and data type (etype
) to Create
. The following code creates a new raster with 10 columns and 10 rows that has one band. It is given a byte (values range from 0-255) data type.
fn_create = "C:/qgis/raster/gdal_create.tif" # filename for new raster
ds_create = driver_gtiff.Create(fn_create, xsize=10, ysize=10, bands=1, eType=gdal.GDT_Byte)
The code above creates the raster dataset, but we’re not done yet. We still need to set the spatial reference and GeoTransform for the raster, and write data values.
Set Raster Projection¶
To set the raster’s projection we’ll need to import the osr
module (osr
is installed as part of gdal
, as is ogr
). Let’s set the projection of this raster to be WGS84, which is a geographic coordinate reference system, meaning it’s based on degrees instead of linear units. We’ll setup a new spatial reference object, import a spatial reference using the EPSG code (4326), and set the raster projection by exporting the spatial reference to well-known text (WKT). Then we’ll check our work by printing out the projection information for the new raster.
from osgeo import osr
srs = osr.SpatialReference()
srs.ImportFromEPSG(4326)
ds_create.SetProjection(srs.ExportToWkt())
print(ds_create.GetProjection())
GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]
Things look good. The last EPSG code is the one that defines the spatial reference, and it is 4326.
Set the GeoTransform¶
Now that we have defined the raster’s projection, we need to place that raster in space using the GeoTransform. Let’s place the top left corner of this raster at 45 degrees north, 110 degrees west, and give it a pixel resolution of 1 degree. Then we’ll check our work by printing out the GeoTransform information we just set.
geot_create = [-110.0, 1.0, 0.0, 45.0, 0.0, -1.0]
ds_create.SetGeoTransform(geot_create)
print(ds_create.GetGeoTransform())
(-110.0, 1.0, 0.0, 45.0, 0.0, -1.0)
Write Data Values to the Raster¶
The last thing we need to do is write some data values to our raster. We’ll do this by creating a numpy
array and writing it to the raster band as we’ve done before. We’ll also specify a no data value (0) for the raster.
data_create = np.zeros((10, 10))
data_create[1:-1, 1:-1] = 1 # values to 1, leave outer as 0 (no data)
data_create[3:-3, 3:-3] = 2 # set center group as 2
data_create[4:-4, 4:-4] = 3 # set center pixels to 3
ds_create.GetRasterBand(1).WriteArray(data_create) # write the array to the raster
ds_create.GetRasterBand(1).SetNoDataValue(0) # set the no data value
ds_create = None # properly close the raster
# plot the data values we created
plt.figure(figsize=(10, 10))
plt.imshow(data_create)
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x287864ac7f0>
Now, you’ll be able to use this as any other raster. You’ll even be open and interact with it in GIS software.
Conclusion¶
Reading rasters, creating rasters, and writing data to rasters are the buidling blocks of gdal
. These simple operations are very powerful and will help you automate spatial data workflows and pipelines. Reading rasters as numpy
arrays allows you to perform custom operations to create new derivative raster datasets. If you know numpy
and gdal
you have everything you need to work with raster data in Python.
Want to learn more about GDAL?
I have a full-length course on using GDAL for raster analysis with Python. The course contains six hours of video instruction and is complete with code and example datasets so you can easily follow along and replicate the concepts that are taught. Check it out at geospatialschool.com!
Whether you’re looking to take your GIS skills to the next level, or just getting started with GIS, we have a course for you! We’re constantly creating and curating more courses to help you improve your geospatial skills.
All of our courses are taught by industry professionals and include step-by-step video instruction so you don’t get lost in YouTube videos and blog posts, downloadable data so you can reproduce everything the instructor does, and code you can copy so you can avoid repetitive typing