assorted books on shelf
|

NetCDF with Python (netCDF4): Metadata, Dimensions, and Variables

Important information describing the data contained in netCDF files is embedded directly in the netCDF file. Accessing and reading of this information (i.e. metadata) can help you automate tasks, debug code, and validate results easily. This tutorial covers how to access information describing the metadata, dimensions, and variables contained in a netCDF file. For the basics of opening netCDF files and reading data see the netCDF introduction tutorial.

First, import netCDF4 and open a netCDF file.

import netCDF4 as nc

fn = 'path/to/file.nc4'
ds = nc.Dataset(fn)

Once the file is opened you can view a lot of information by simply printing the netCDF dataset variable. Like so.

print(ds)

The output gives you basic metadata including the names of variables and their dimensions.

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4_CLASSIC data model, file format HDF5):
    start_year: 1980
    source: Daymet Software Version 3.0
    Version_software: Daymet Software Version 3.0
    Version_data: Daymet Data Version 3.0
    Conventions: CF-1.6
    citation: Please see http://daymet.ornl.gov/ for current Daymet data citation information
    references: Please see http://daymet.ornl.gov/ for current information on Daymet references
    dimensions(sizes): time(1), nv(2), y(8075), x(7814)
    variables(dimensions): float32 time_bnds(time,nv), int16 lambert_conformal_conic(), float32 lat(y,x), float32 lon(y,x), float32 prcp(time,y,x), float32 time(time), float32 x(x), float32 y(y)
    groups: 

You can obtain a dictionary of all available information (at the dataset level) with ds.__dict__. This returns the name of each item and its value.

print(ds.__dict__)
OrderedDict([('start_year', 1980), ('source', 'Daymet Software Version 3.0'), ('Version_software', 'Daymet Software Version 3.0'), ('Version_data', 'Daymet Data Version 3.0'), ('Conventions', 'CF-1.6'), ('citation', 'Please see http://daymet.ornl.gov/ for current Daymet data citation information'), ('references', 'Please see http://daymet.ornl.gov/ for current information on Daymet references')])

Information for individual dictionary items can be accessed as follows.

print(ds.__dict__['start_year']
1980

The dimensions of the netCDF file are accessed with ds.dimensions, which returns a dictionary containing the dimension name and size. ds.dimensions will give the entire dictionary. Information about all the individual items can be accessed as follows.

for dim in ds.dimensions.values():
    print(dim)
<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 1

<class 'netCDF4._netCDF4.Dimension'>: name = 'nv', size = 2

<class 'netCDF4._netCDF4.Dimension'>: name = 'y', size = 8075

<class 'netCDF4._netCDF4.Dimension'>: name = 'x', size = 7814

Information for an individual dimensions can be access with ds.dimensions['dimension name']. For example, ds.dimensions['x'].

Variables are accessed with ds.variables, which also returns a dictionary. Variable information is accessed the same way as dimensions. To print out information for each variable the following code is used.

for var in ds.variables.values():
    print(var)
<class 'netCDF4._netCDF4.Variable'>
float32 time_bnds(time, nv)
    time: days since 1980-01-01 00:00:00 UTC
unlimited dimensions: time
current shape = (1, 2)
filling on, default _FillValue of 9.969209968386869e+36 used

<class 'netCDF4._netCDF4.Variable'>
int16 lambert_conformal_conic()
    grid_mapping_name: lambert_conformal_conic
    longitude_of_central_meridian: -100.0
    latitude_of_projection_origin: 42.5
    false_easting: 0.0
    false_northing: 0.0
    standard_parallel: [25. 60.]
    semi_major_axis: 6378137.0
    inverse_flattening: 298.25723
unlimited dimensions: 
current shape = ()
filling on, default _FillValue of -32767 used

<class 'netCDF4._netCDF4.Variable'>
float32 lat(y, x)
    units: degrees_north
    long_name: latitude coordinate
    standard_name: latitude
unlimited dimensions: 
current shape = (8075, 7814)
filling on, default _FillValue of 9.969209968386869e+36 used

<class 'netCDF4._netCDF4.Variable'>
float32 lon(y, x)
    units: degrees_east
    long_name: longitude coordinate
    standard_name: longitude
unlimited dimensions: 
current shape = (8075, 7814)
filling on, default _FillValue of 9.969209968386869e+36 used

<class 'netCDF4._netCDF4.Variable'>
float32 prcp(time, y, x)
    _FillValue: -9999.0
    coordinates: lat lon
    grid_mapping: lambert_conformal_conic
    missing_value: -9999.0
    cell_methods: area: mean time: sum within days time: sum over days
    units: mm
    long_name: annual total precipitation
unlimited dimensions: time
current shape = (1, 8075, 7814)
filling on
<class 'netCDF4._netCDF4.Variable'>
float32 time(time)
    long_name: time
    calendar: standard
    units: days since 1980-01-01 00:00:00 UTC
    bounds: time_bnds
unlimited dimensions: time
current shape = (1,)
filling on, default _FillValue of 9.969209968386869e+36 used

<class 'netCDF4._netCDF4.Variable'>
float32 x(x)
    units: m
    long_name: x coordinate of projection
    standard_name: projection_x_coordinate
unlimited dimensions: 
current shape = (7814,)
filling on, default _FillValue of 9.969209968386869e+36 used

<class 'netCDF4._netCDF4.Variable'>
float32 y(y)
    units: m
    long_name: y coordinate of projection
    standard_name: projection_y_coordinate
unlimited dimensions: 
current shape = (8075,)
filling on, default _FillValue of 9.969209968386869e+36 used

Watch the video below for a detailed, step-by-step demonstration of accessing netCDF metadata with Python.

Similar Posts