Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential requirements for CMIP7 #762

Open
taylor13 opened this issue Oct 4, 2024 · 2 comments
Open

potential requirements for CMIP7 #762

taylor13 opened this issue Oct 4, 2024 · 2 comments
Milestone

Comments

@taylor13
Copy link
Collaborator

taylor13 commented Oct 4, 2024

(FYI @sashakames, @durack1,@matthew-mizielinski, @wolfiex even though this is primarily for Chris)

It looks likely that some changes to the output requirements for CMIP7 will be agreed shortly and that "branded variables" will be relied on in identifying variables in the cmor output files. It would be good to now consider how this might impact CMOR, so I'll raise this issue now:

How difficult would it be to implement the following?

  1. The user specifies “frequency” as one of the entries in the CMIP6_input.json file rather than it being specified in a CMOR variable table. CMOR then handles “frequency” in the same way it handles, for example, “experiment_id”, and writes it as a global attribute. (We would also remove “frequency” and “approx_interval” from the CMOR variable tables.) I know that CMOR checks that users have sent a time coordinate that is approximately consistent with "approx_frequency", but that check could be dropped if it impairs implementation of this new approach.
  2. The user specifies “region” as one of the entries in the CMIP6_input.json file and then CMOR handles it in the same way as, for example, “experiment_id” and writes it as a global attribute?
  3. CMOR writes as a global attribute the "branding suffix", which it would need to obtain by extracting the suffix in the "table_entry" (i.e., the part following the underscore). See below for an example.
  4. CMOR writes as global attributes the values of the elements comprising the branding suffix: temporal_sampling, vertical_sampling, horizontal_sampling, and area_sampling? These would be either be extracted, along with other metadata, from the CMOR variable table (as shown in the table example below), or could be obtained from a look-up table given the branding suffix.
  5. In constructing file names and directory structure, rely on a somewhat different set of global attributes than in CMIP6. For example instead of including “table name” in the file name, include instead the “branded variable suffix”. (My guess is that this is trivially done by simply specifying a different template in the CMIP6_input.json file.)

To implement the above, new CMOR variable tables will need to be generated with the following changes (which could be implemented by someone other than Chris):

  1. Remove "approx_interval" from the header of each table.
  2. Remove “frequency” from each entry in the variable tables.
  3. Replace all the variable “table_entries” with branded variable names.
  4. Make sure out_name is set to the root name prefix of the branded variable (i.e., the part of the branded variable name preceding the underscore).
  5. Add 5 new attributes to each variable in the tables: branding_suffix, temporal_type, vertical_type, horizontal_type, and area_type. These will be written by CMOR as global attributes in the netCDF files.
  6. Reorganize and rename tables to group the variables more rationally and independently of frequency and region.

I should think most of the above changes to the variable tables should have little impact on the CMOR code itself.

A new CMOR7 table variable entry would include 5 new attributes (the first 5 lines below), and the "frequency" would be removed from the table (in CMIP6 it appeared just before the "long_name" attribute), resulting in the following:

"tas_tavg-z0-hxy-x": {

      "branding_suffix":"tavg-z0-hxy-x"
      "temporal_type":"mean"
      "vertical_type":"no vertical dimension"
      "horizontal_type":"gridded"
      "area_type":"unmasked"

      "cell_measures": "area: areacella",
      "cell_methods": "area: time: mean",
      "comment": "near-surface (usually, 2 meter) air temperature",
      "dimensions": [
        "longitude",
        "latitude",
        "time",
        "height2m"
      ],

      "long_name": "Near-Surface Air Temperature",
      "modeling_realm": [
        "atmos"
      ],
      "ok_max_mean_abs": "",
      "ok_min_mean_abs": "",
      "out_name": "tas",
      "positive": "",
      "standard_name": "air_temperature",
      "type": "real",
      "units": "K",
      "valid_max": "",
      "valid_min": ""
    },

Note that the table_entry has been changed from "tas" to the branded variable name: "tas_tavg-z0-hxy-x". Also note that the "out_name" will now without exception be just the root name (in this case tas) appearing before the underscore in the branded variable name. In CMIP6, sometimes the out_name differed from the table_entry.

We could elect to have CMOR generate "temporal_type", "vertical_type", "horizontal_type", and "area_type" by parsing the elements comprising the branding_suffix and then looking up in CVs the associated short text descriptions. That would mean these 4 global attributes would not have to be added to the existing tables.

@mauzey1
Copy link
Collaborator

mauzey1 commented Oct 4, 2024

Is this what the mip-cmor-tables will look like? Would the removal of "frequency" reduce the number of tables since they are currently grouped by modeling realm and frequency?

Are users supposed to select which "branded variables" from a table they are going to use instead of "variable_id"?

I assume "region" is going to be like "realm" in global attributes where its valid entries will be found in the CV, correct?

Will the "approx_interval" come from the CV or some other table? CMOR currently uses this value for a test.

@mauzey1 mauzey1 added this to the 4.0/Future milestone Oct 4, 2024
@taylor13
Copy link
Collaborator Author

taylor13 commented Oct 4, 2024

The tables will be structured the same as old tables with the changes I enumerated above. But, we can group variables into tables anyway we like (even placing them all into a single table, if we like), and instead of having a total of 2062 table entries (across tables), we’ll have about 1600 (because the same variable sampled at multiple frequencies will be found in only one table).

As I understand it, “variable_id” records the “out_name” found in the table, which is also the actual name of the variable array written to the netCDF file. That won’t change. As I noted, the out_name in the new tables will be the root name (i.e., prefix) of the branded variable name (e.g., “tas”, which is the prefix appearing in “tas_tavg-z0-hxy-x”)

As for realm, experiment_id, institute_id, etc., the valid regions will be found in a CV (and for CMIP7, there may only be a few options: “global”, “Antarctica”, “Greenland”, and a couple more perhaps.

We might decide to turn off the frequency check in CMIP7, which, as you say, is based on "approx_interval". Or we could provide a CV with "frequency" as the key, and the approximate interval as the value. The user would specify the "frequency" in the input table (as described above), and then CMOR would go to the frequency table and extract the approx_interval so it could perform its check. The frequency CV might look like:

“frequency” : {
      “mon” :  {
            “label” : “monthly”,
            “approx._interval”:“30”
      },
      “day” : {
             “label” : “daily”,
             “approx._interval”:“1”
      },
etc.
. 
. 
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants