The Geospatial Data Abstraction Library (GDAL) is the standard for managing spatial data formats. GDAL has been incorporated into many different enterprise and open source GIS projects. Because GDAL is open source, it can be used by all. Learning to use GDAL with Python can help you automate workflows and implement custom raster processing solutions. This tutorial (in the notebook below) will walk you through the basics of reading raster datasets with GDAL and using GDAL to create new raster datasets.
Install the GDAL Python Bindings¶
Before you can get started writing code, you’ll need to install the
gdal python bindings (aka the
gdal package or library). If you use the Windows operating system this may not be a simple task. I have developed a couple of tutorials to help you with
gdal installation using both Anaconda/Conda (recommended) and pip (a little more compicated).
To start, we’ll import
gdal, for raster operations, and
numpy, for array operations. Depending on your
gdal version, you may need to replace
from osgeo import gdal with
import gdal. I’ve also imported
matplotlib for raster band display.
from osgeo import gdal import numpy as np import matplotlib.pyplot as plt
Open a Raster¶
gdal to open a raster file. To follow this example you’ll need to have access to a raster file on a local drive. I’ve specified the path to my raster file with the
fn = "C:/qgis/raster/USGS_1_n44w115.tif"
Once the raster location is intialized, use
gdal.Open to open the raster as a GDAL dataset. Two arguments can be specified for
gdal.Open. The raster dataset path (required) and the access method (optional, one of
gdal.GA_ReadOnly (default) or
ds = gdal.Open(fn)
Now, let’s check the type of the
ds variable. As you can see, it’s a GDAL Dataset.
print("'ds' type", type(ds))
'ds' type <class 'osgeo.gdal.Dataset'>
Raster Dataset Information and Properties¶
Once a raster dataset is opened, you can obtain information and metadata from the dataset to preform additional analysis. With
gdal we can access many raster properties, including, projection, size (columns and rows), the number of bands, and the geotransform (upper left corner, cell width, etc.). Let’s start by obtaining the projection, size, and band count for the raster.
print("Projection: ", ds.GetProjection()) # get projection print("Columns:", ds.RasterXSize) # number of columns print("Rows:", ds.RasterYSize) # number of rows print("Band count:", ds.RasterCount) # number of bands
Projection: GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101004,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4269"]] Columns: 3612 Rows: 3612 Band count: 1
You’ll notice the raster projection is defined by a string that represents the EPSG 4269 projection parameters. The raster has 3612 rows and columns and one band.
Now let’s take a look at the raster’s affine transformation information, which is also referred to as the GeoTransform. To do this call the
GetGeoTransfrom() method of the GDAL dataset, as follows.
GeoTransform (-115.00166666668, 0.00027777778000000013, 0.0, 44.00166666668, 0.0, -0.00027777778000000013)
gdal GeoTransform is a tuple that contains six elements. Each GeoTransform element is expalined below in order of its pythonic index (0-5), that is, its position within the tuple.
-  left X coordinate
-  pixel width
-  row rotation (usually zero)
-  top Y coordinate
-  column rotation (usually zero)
-  pixel height, this will be negative for north up images
Data from the GeoTransform are used to determine the image extent, pixel centroids and corners, and pixel addresses that correspond to geographic coordinates.
Read Raster Data¶
You should now undertstand how to obtain basic metadata from a raster dataset with GDAL. The next step is accessing the raster data. With GDAL, raster data is accessed by raster bands. First, we’ll retrieve the raster band. Next we’ll retrieve the band data as a
numpy array. This can be done in a single line of code.
data_array = ds.GetRasterBand(1).ReadAsArray() data_array.shape
You’ll notice that the resulting array has the same number of rows and columns as the
ds), as it should. Also, notice that indexing for raster bands starts at 1, not at 0.
Display a Raster Band¶
With Python, you can easily display a raster band with the
imshow funciton of
matplotlib. I’ve imported
matplotlib at the beginning of this tutorial. Below I set the figure size, then use
imshow to plot
data_array, which contains the elevation data from the raster band. As you can see, the resulting figure shows the elevation data. You can check out the imshow documentation for more customization options.
plt.figure(figsize=(10, 10)) plt.imshow(data_array) plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x287f111b520>
Getting No Data Values¶
It is common for a raster band to have a designated value that represents a pixel with no data. Obtaining the no data value is important so that you can exclude those values from calculations you may perform on the data. Let’s retreive and print the no data value for the elevation raster band.
ndv = data_array = ds.GetRasterBand(1).GetNoDataValue() print('No data value:', ndv)
No data value: -999999.0
As you can see, the no data value for this raster band is -999999.0
Setting No Data Values¶
When creating a new raster, you may also need to set a no data value. This is done by calling
SetNoDataValue on the raster band and passing the no data value. I’ll demonstrate this in detail in a following section when we create a new raster.
Properly Close a Raster¶
Once you are finished with a raster, it’s important to close it properly. Otherwise, you may experience problems accessing the data. To close a raster dataset, set it equal to
None, like so.
ds = None
Create and Write Data to a Raster¶
Now that you’ve learned how to use
gdal to read data as a
numpy array, I’ll demonstrate how create new raster and write to a new raster dataset.
There are two primary ways to create a new raster with
gdal. You can copy an existing raster dataset or create a new one from scratch. I’ll demonstrate both methods here.
gdal, all raster creation is done with a driver. So the first step to create a raster to instatiate a
gdal driver. With a driver, we’re basically letting
gdal know which file format we want to write so
gdal can manage the file properly. You can find a full list of the
gdal drivers in the documentation.
For these examples we’ll use the GeoTiff (
Start by instantiating a GeoTiff driver as follows. Below, I print out the object type for demonstration purposes.
driver_gtiff = gdal.GetDriverByName('GTiff') print(type(driver_gtiff))
Now that we have a driver we’re ready to start writing GeoTiff files.
Copy an Exisiting Raster Dataset¶
The easiest way to create a new raster with
gdal is to use a driver to copy an existing raster dataset. This is accomplished with the driver’s
CreateCopy method. When using
CreateCopy we just need to pass an existing raster dataset and an output file name. Below, I demonstrate how to copy the intial dataset we read in (
# reopen the dataset because we closed it above ds = gdal.Open(fn) fn_copy = "C:/qgis/raster/gdal_copy.tif" ds_copy = driver_gtiff.CreateCopy(fn_copy, ds)
Running the code above will create a new raster at the specified file path. It will be identical to the DEM we opened at the beginning of this exercise.
CreateCopy is an excellent method when you want to create a new raster with the same extent, resolution, and data type as an existing raster.
Write Data to the Copied Raster¶
Once you’ve copied the existing raster you can then overwrite the data for any band. Let’s go through an example.
In this case we may want to apply an exaggeration to the elevation data (
data_array). We’ll subtract the minimum elevation from all the pixels, then multiply by 1.5 to create an exaggeration. Finally, we’ll write the array to the raster we just created (
data_array = ds.GetRasterBand(1).ReadAsArray() # read band data from the existing raster data_nan = np.where(data_array == ndv, np.nan, data_array) # set all the no data values to np.nan so we can easily calculate the minimum elevation data_min = np.min(data_nan) # get the minimum elevation value (excluding nan) data_stretch = np.where(data_array == ndv, ndv, (data_array - data_min) * 1.5) # now apply the strech algorithm ds_copy.GetRasterBand(1).WriteArray(data_stretch) # write the calculated values to the raster ds_copy = None # properly close the raster
Now let’s check our work by reopening the the dataset, reading the raster band data, and plotting it.
ds_copy = gdal.Open(fn_copy) data_stretch = ds_copy.GetRasterBand(1).ReadAsArray() plt.figure(figsize=(10, 10)) plt.imshow(data_stretch) plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x28783205fd0>