NetCDF with Python (netCDF4): Metadata, Dimensions, and Variables
Important information describing the data contained in netCDF files is embedded directly in the netCDF file. Accessing and reading of this information (i.e. metadata) can help you automate tasks, debug code, and validate results easily. This tutorial covers how to access information describing the metadata, dimensions, and variables contained in a netCDF file. For the basics of opening netCDF files and reading data see the netCDF introduction tutorial.
First, import netCDF4 and open a netCDF file.
import netCDF4 as nc fn = 'path/to/file.nc4' ds = nc.Dataset(fn)
Once the file is opened you can view a lot of information by simply printing the netCDF dataset variable. Like so.
print(ds)
The output gives you basic metadata including the names of variables and their dimensions.
<class 'netCDF4._netCDF4.Dataset'> root group (NETCDF4_CLASSIC data model, file format HDF5): start_year: 1980 source: Daymet Software Version 3.0 Version_software: Daymet Software Version 3.0 Version_data: Daymet Data Version 3.0 Conventions: CF-1.6 citation: Please see http://daymet.ornl.gov/ for current Daymet data citation information references: Please see http://daymet.ornl.gov/ for current information on Daymet references dimensions(sizes): time(1), nv(2), y(8075), x(7814) variables(dimensions): float32 time_bnds(time,nv), int16 lambert_conformal_conic(), float32 lat(y,x), float32 lon(y,x), float32 prcp(time,y,x), float32 time(time), float32 x(x), float32 y(y) groups:
You can obtain a dictionary of all available information (at the dataset level) with ds.__dict__
. This returns the name of each item and its value.
print(ds.__dict__)
OrderedDict([('start_year', 1980), ('source', 'Daymet Software Version 3.0'), ('Version_software', 'Daymet Software Version 3.0'), ('Version_data', 'Daymet Data Version 3.0'), ('Conventions', 'CF-1.6'), ('citation', 'Please see http://daymet.ornl.gov/ for current Daymet data citation information'), ('references', 'Please see http://daymet.ornl.gov/ for current information on Daymet references')])
Information for individual dictionary items can be accessed as follows.
print(ds.__dict__['start_year']
1980
The dimensions of the netCDF file are accessed with ds.dimensions
, which returns a dictionary containing the dimension name and size. ds.dimensions
will give the entire dictionary. Information about all the individual items can be accessed as follows.
for dim in ds.dimensions.values(): print(dim)
<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 1 <class 'netCDF4._netCDF4.Dimension'>: name = 'nv', size = 2 <class 'netCDF4._netCDF4.Dimension'>: name = 'y', size = 8075 <class 'netCDF4._netCDF4.Dimension'>: name = 'x', size = 7814
Information for an individual dimensions can be access with ds.dimensions['dimension name']
. For example, ds.dimensions['x']
.
Variables are accessed with ds.variables
, which also returns a dictionary. Variable information is accessed the same way as dimensions. To print out information for each variable the following code is used.
for var in ds.variables.values(): print(var)
<class 'netCDF4._netCDF4.Variable'> float32 time_bnds(time, nv) time: days since 1980-01-01 00:00:00 UTC unlimited dimensions: time current shape = (1, 2) filling on, default _FillValue of 9.969209968386869e+36 used <class 'netCDF4._netCDF4.Variable'> int16 lambert_conformal_conic() grid_mapping_name: lambert_conformal_conic longitude_of_central_meridian: -100.0 latitude_of_projection_origin: 42.5 false_easting: 0.0 false_northing: 0.0 standard_parallel: [25. 60.] semi_major_axis: 6378137.0 inverse_flattening: 298.25723 unlimited dimensions: current shape = () filling on, default _FillValue of -32767 used <class 'netCDF4._netCDF4.Variable'> float32 lat(y, x) units: degrees_north long_name: latitude coordinate standard_name: latitude unlimited dimensions: current shape = (8075, 7814) filling on, default _FillValue of 9.969209968386869e+36 used <class 'netCDF4._netCDF4.Variable'> float32 lon(y, x) units: degrees_east long_name: longitude coordinate standard_name: longitude unlimited dimensions: current shape = (8075, 7814) filling on, default _FillValue of 9.969209968386869e+36 used <class 'netCDF4._netCDF4.Variable'> float32 prcp(time, y, x) _FillValue: -9999.0 coordinates: lat lon grid_mapping: lambert_conformal_conic missing_value: -9999.0 cell_methods: area: mean time: sum within days time: sum over days units: mm long_name: annual total precipitation unlimited dimensions: time current shape = (1, 8075, 7814) filling on <class 'netCDF4._netCDF4.Variable'> float32 time(time) long_name: time calendar: standard units: days since 1980-01-01 00:00:00 UTC bounds: time_bnds unlimited dimensions: time current shape = (1,) filling on, default _FillValue of 9.969209968386869e+36 used <class 'netCDF4._netCDF4.Variable'> float32 x(x) units: m long_name: x coordinate of projection standard_name: projection_x_coordinate unlimited dimensions: current shape = (7814,) filling on, default _FillValue of 9.969209968386869e+36 used <class 'netCDF4._netCDF4.Variable'> float32 y(y) units: m long_name: y coordinate of projection standard_name: projection_y_coordinate unlimited dimensions: current shape = (8075,) filling on, default _FillValue of 9.969209968386869e+36 used
Watch the video below for a detailed, step-by-step demonstration of accessing netCDF metadata with Python.