Create NetCDF Files with Python
Gridded, spatial data are commonly stored in NetCDF files. This is especially true for climate data. NetCDF files offer more flexibility and transparency than some traditional raster formats by supporting multiple variables and detailed metadata. Because of the metadata and file structure NetCDF files can be more difficult to access than traditional raster formats. This article addresses the basics of creating a NetCDF file and writing data values in Python. I previously wrote about accessing metadata and variables from a NetCDF file with Python.
Create a NetCDF Dataset
Import the netCDF4
and numpy
modules. Then define a file name with the .nc
or .nc4
extension. Call Dataset
and specify write mode with 'w'
to create the NetCDF file by. The NetCDF file is not established and can be written to. When finished, be sure to call close()
on the data set.
Add Dimensions
NetCDF files generally contain three dimesions: time, width (x or longitude) and height (y or latitude). Width and height dimensions are always fixed. The time dimension is dynamic (can grow), which allows time steps to be added to the file. Dynamic, or growing, dimensions are termed ‘unlimited’ in NetCDF.
Unlimited dimensions can be added to and are specified by None
. We’ll use an unlimited dimension for the time variable so that it can grow. In other words, we can keep appending time steps to the file. Also create latitude and longitude dimensions. lat
and lon
define the geographical extents and dimensions of our file. Here were just creating a dimension of size 10. This means the resulting grid will have just 10 rows and 10 columns. The size, or geographic distance, of lat
and lon
are specified as variables. In fact, each dimension will have a corresponding variable.
Add NetCDF Variables
Variables contain the actual data of the file. They also define the grid the data are referenced to. This file will contain four variables. Latitude and longitude define the grid values and data location. times
defines the layers in the data file. value
contains the actual data. To create a variable, specify the variable name, data type, and shape. Shape is defined as a tuple by referencing dimension names. Additional metadata are also specified. Here we define the units of value
as Unknown
.
Assign Latitude and Longitude Values
Create a simple grid with grid cells that measure 1 degree by 1 degree with numpy.arange
. Assign y values to lats
and x values to lons
. Now we just need to assign data values that match the dimensions of the grid we’ve created.
Assign NetCDF Data Values
Add data for two time steps to the value variable that we created. Each time step is represented by a 2D numpy array. The size of each array must match the lat
and lon
dimensions. Create an array of random numbers ranging from 0 to 100 with numpy.random
. This array contains data for the first time step.
Next, create an array with values that increase linearly from 0.5 to 5.0. To do this, create two 1D arrays with numpy.linspace
and add them together across opposing axes. The code below shows how it’s done. Close ds
after you’ve created the arrays and assigned them to value
. Your NetCDF is now saved and ready. Open the file in QGIS to visualize, or plot the arrays in Python. Images of the result are shown below.
Conclusion
Once you understand the basic structure of a NetCDF file it can be a very useful way to work with spatial data. In this example we create a file with only one data variable. But multiple variables can be added to a single file, potentially eliminating the number of files required to manage your data. One of the most useful aspects of NetCDF files is the documentation and metadata that clarify the data they contain.