Last modified 19 months ago Last modified on 2018-07-24 11:20:29

Background and Details for making new netCDF data files EPIC compliant

Our files all follow some common rules that allow for interoperability. The rules for standardization come from the Unidata Common Data Model (CDM), the EPIC conventions and bits we've added to work with new types of data like bursts and wave spectra. By following these, we are easily able to convert to CF-1.6 and newer conventions.

There are some key elements that must be in all files:

  • dimension or coordinate variables (used to define the shapes of data variables) time, lat, lon, depth
  • global attributes - provide metadata about the file as a whole
  • data variables (shape determined by the coordinate variables) contain measurements
  • variable attributes - provide metadata specifically about that variable

Our data variables need to fit in one of the defined shapes of container. The standard ones are:

  • time series – time, depth, lat, lon (time is a 1-d array and depth is scalar)
  • profile time series – time, depth, lat, lon (time and depth are 1-d arrays)
  • burst files – time, sample/profile, (depth) (time is a 2-d matrix, sample (or profile) is burst#, depth, if it exists is a 1-d array)
  • waves files with spectra variables - stats variables are time, lat, lon; specta vars are time, frequency, lat, lon (time is a 1-d array; depth is not a dimension, except CD and CS for DIWASP)
  • sonar image files – time, x,y (time is a 1-d array)

Instantiation is the process by which a variable is created by the program. Instantiation can be completed in a variety of ways, so I'm not saying how to do it, but what must be there when it's done. Typically, the dimesions are defined, then the dimension variables are created, then they are used to define and create the data variables of the proper shapes.

Some things to know about variables before instantiation:

  • Is it a dimension?
  • Is it a burst?
  • What shape is it? (how dimensioned)
  • What data type is it? (integer, float, double? use the smallest type possible)
  • What kind of data? (temperature, currents)
  • What name to use?
  • What are the units?
  • What attributes are required?
  • What is the fill value

Here is a snippet of code that creates the "usual" coordinate variables. Note that EPIC conventions express "time" with time and time2; both must be defined as datatype='int32' in order for correct plotting in ncBrowse.More info on converting EPIC time to and from Matlab's datenum is at convert EPIC time to Matlab datenum.

% create coordinate variables

These attributes are required for Coordinate (dimension) variables in our EPIC files. Coordinate variables should not have _FillValue, minimum or maximum attributes, and may not contain anything missing. Any other attributes may be added.

  • Name (char)
  • Type (char)
  • Units (char)
  • Datum (char) (time doesn't need a datum)

Here's a Matlab example of how to instantiate global attributes (showing expected type).

If you don't explictly cast the 'datatype', Matlab will create all variables as type "double". All our data fits OK in "single" (float) containers, so smaller files will result in excercising care in casting to the appropriate size. In some cases like sonar data, the measurements are integer, so that's what the data variable should be. Here is information about netCDF datatypes vs Matlab datatypes and what they can contain. Note, you should cast the variable that is being written into the file when it's written too.

Here are a couple of examples of how to cast to a specific datatype:

  nccreate(netnam,varame,'Dimensions',{'depth' 'lat' 'lon' 'time'},'Datatype','single');
  nccreate(netnam,varame,'Dimensions',{'depth' 'lat' 'lon' 'time'},'Datatype','int32');

These attributes are required in Data (Non-dimension) variables in our EPIC files (going forward)

  • Name (char)
  • Long_name (char)
  • Units (char)
  • Epic_code (numeric)
  • height_depth_units (char)
  • _FillValue (numeric- must match the type of the variable, usually 1e35))
  • minimum (numeric)
  • maximum (numeric)
  • initial_sensor_height (numeric)
  • sensor_type (char)
  • sensor_depth (numeric)

The top 5 attributes for each variable may be read from this repository in trunk\dolly\all_EPICvarinfo.csv.

The EPIC_code attribute is needed for the conversion to CF, so don't omit it. We’re dropping use of Fortran_format, valid_range, generic_name atts.: any other variable names may be added. “minimum” and “maximum” are changed to the preferred “actual_min” and “actual_max” in the conversion to CF. For consistency with the existing files, we’ll leave them as “minimum” and “maximum” in the EPIC version. We’re also leaving out the “standard_name” attribute in the EPIC version, it is added in the CF conversion