shapes and mosaic design

Intro to NetCDF with Python (netCDF4)

Many geographic data are store in netCDF format. This is especially true for climate data. Interacting with netCDF data files is an important skill for any geographer, physical scientist or data scientist to have. This tutorial will get you started using the netCDF4 Python package to read netCDF data.

First, you’ll need to install netCDF4. Depending on your Python setup this may be very easy or quite complicated. For simplicity, I recommend using Anaconda for the installation. In the Anaconda prompt simply type:

conda install netcdf4

Then follow the prompts to install the package.

Now, open PyCharm (or another IDE, or a Python console) and import netCDF4. You should be able to run the import without any problems.

import netCDF4 as nc

Once you’ve confirmed that you can import netCDF4 define a variable with a path to a netCDF file

fn = 'C:/path/to/file.nc4'  # path to netcdf file

Then read the file as a netCDF dataset.

ds = nc.Dataset(fn)  # read as netcdf dataset

If you print the dataset you should get information that looks similar to the following. It will describe the source of the data and the variables represented by the data.

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4_CLASSIC data model, file format HDF5):
    start_year: 1980
    source: Daymet Software Version 3.0
    Version_software: Daymet Software Version 3.0
    Version_data: Daymet Data Version 3.0
    Conventions: CF-1.6
    citation: Please see for current Daymet data citation information
    references: Please see for current information on Daymet references
    dimensions(sizes): time(1), nv(2), y(8075), x(7814)
    variables(dimensions): float32 time_bnds(time,nv), int16 lambert_conformal_conic(), float32 lat(y,x), float32 lon(y,x), float32 prcp(time,y,x), float32 time(time), float32 x(x), float32 y(y)

You can show information for a specific variable as follows. This shows information for the ‘prcp’ (precipitation) variable.

<class 'netCDF4._netCDF4.Variable'>
float32 prcp(time, y, x)
    _FillValue: -9999.0
    coordinates: lat lon
    grid_mapping: lambert_conformal_conic
    missing_value: -9999.0
    cell_methods: area: mean time: sum within days time: sum over days
    units: mm
    long_name: annual total precipitation
unlimited dimensions: time
current shape = (1, 8075, 7814)

Data from a variable are accessed in the same way you would access data in a numpy array. So, to get all the data for the ‘prcp’ variable, we use the following code.

prcp = ds['prcp'][:]  # get data for variable

Now, print out a small slice of the data to see what the precipitation values are.

print(prcp[0, 4000:4005, 4000:4005])  # print slice of data
[[341.0 347.0 336.0 329.0 353.0]
 [336.0 339.0 341.0 332.0 349.0]
 [337.0 340.0 334.0 336.0 348.0]
 [342.0 344.0 332.0 338.0 350.0]
 [350.0 351.0 342.0 346.0 348.0]]

The video below gives a step-by-step demonstration of accessing data in a netCDF.

Similar Posts