3.2. Using CDEPS in HAFS

Attention

This capability is not currently being developed, maintained, or tested, but the information in this chapter is provided as a starting point for interested developers.

3.2.1. Introduction

The Community Data Models for Earth Prediction Systems (CDEPS) allows users of coupled Earth system models to reduce the amount of feedback from active model components by replacing one or more active model components with canned datasets. These datasets can be generated from a variety of sources, including prior model runs, reanalyses, and gridded data constructed from observations. For example, users might initialize and force a coupled atmosphere-ocean model (e.g., HYCOM) with the 5th Generation European Centre for Medium-Range Weather Forecasts Reanalysis [1] (ERA5), instead of using the UFS Weather Model itself.

The CDEPS implementation in HAFS currently supports data atmosphere (DATM) and data ocean (DOCN) models. The workflow supports the ERA5 dataset for DATM and the Optimally-Interpolated Sea-Surface Temperature v2.1 [2] (OISST) and Group for High-Resolution Sea-Surface Temperature [3] (GHRSST) datasets for DOCN. Before running the workflow, the user must stage the required datasets on disk. Download scripts are provided for the supported datasets. Advice for adding support for additional datasets is provided in Appendix B.

CDEPS has been added to HAFS under the Improve Workflow Usability, Portability, and Testing Capabilities Project being executed by the National Center for Atmospheric Research (NCAR) Climate and Global Dynamics Laboratory (CGD) and the University of Colorado/Cooperative Institute for Research in Environmental Sciences (CU/CIRES). These efforts are funded under the Infrastructure portfolio of the Fiscal Year 2018 Disaster Related Appropriation Supplemental (DRAS), commonly referred to as the Hurricane Supplemental (HSUP).

3.2.2. Obtaining the Code

The HAFS-CDEPS capability is contained in the develop branch of the HAFS repository.

To obtain the code, run:

git clone --recursive -b develop https://github.com/hafs-community/HAFS

3.2.3. Data Access

Before users can perform a DATM run with ERA5, they must create a Climate Data Store account (https://cds.climate.copernicus.eu) and digitally sign the ERA5 license agreement. Users will be assigned a key, which should be added to a file called .cdsapirc in their home directory on the machine(s) they plan to use. The process is described in more detail at https://cds.climate.copernicus.eu/api-how-to.

There are no prerequisites to downloading supported datasets for DOCN.

3.2.4. Data Download

Before running the workflow, the user must download the necessary input data. Three scripts are provided for this purpose in the ush/cdeps_utils/ directory:

hafs_era5_download.py
hafs_oisst_download.py
hafs_ghrsst_download.py

The scripts must be run with Python 3.6 or higher. The required Python packages are listed at the top of each script. The hafs_era5_download.py script also requires the Climate Data Operators (CDO) module to be loaded beforehand.

The scripts can be run in the following manner:

./hafs_<dataset>_download.py [options] day [day [...]]

where <dataset> is one of the options in the scripts listed above. Day can be specified in any of the following ways:

20210815: Specify one day (e.g., August 15, 2021)
20210815-20210819: Specify a range of days (e.g., August 15-19, 2021)
2018: Specify an entire year (e.g., 2018)

Data must be downloaded for the entire length of the forecast, plus one day before and one day after. For example, a user running a 126-hr forecast initialized at 2021081518 with ERA5 data should run the download script like this:

./hafs_era5_download.py 20210814-20210822

After downloading the data, specify its location using DATMdir or DOCNdir in parm/system.conf.

3.2.5. Building CDEPS in HAFS

The DAPP keyword in the call to ./compile.sh in ./sorc/build_forecast.sh should be set to -DAPP=HAFS-ALL to build HAFS with support for data models. The resulting executable can also be used for HAFS runs with active atmosphere and ocean models.

By default, the DAPP keyword should already be set to HAFS-ALL on all supported machines except wcoss_cray.

The remainder of the build process is the same as described in the HAFS Quick Start Guide.

3.2.6. Using CDEPS in the HAFS Workflow

The HAFS workflow can be used to run data model experiments with minimal modifications, which are described below.

Modify the ./rocoto/cronjob_hafs_cdeps.sh script:

Uncomment the definitions of HOMEhafs, dev, and PYTHON3 appropriate for the HPC platform that you are using.
Set HOMEhafs to the top-level directory that contains the HAFS scripts and source codes.

Near the bottom of the script, review the commands for the three DATM and DOCN experiments, and comment out the commands for any experiments that you do not want to run:

To run the DATM with ERA5, the command is:

${PYTHON3} ./run_hafs.py -t ${dev} 2019082900 00L HISTORY \
    config.EXPT=${EXPT} \
    config.SUBEXPT=${EXPT}_era5 \
    forecast.output_history=.true. \
    ../parm/hafs_regional_static.conf \
    ../parm/hafs_hycom.conf \
    ../parm/hafs_datm.conf \
    ../parm/hafs_datm_era5.conf

To run the DOCN with OISST, the command is:

${PYTHON3} ./run_hafs.py -t ${dev} 2019082900 00L HISTORY \
   config.EXPT=${EXPT} \
   config.SUBEXPT=${EXPT}_oisst \
   forecast.output_history=.true. \
   ../parm/hafs_regional_static.conf \
   ../parm/hafs_docn.conf \
   ../parm/hafs_docn_oisst.conf

To run the DOCN with GHRSST, the command is:

${PYTHON3} ./run_hafs.py -t ${dev} 2019082900 00L HISTORY \
   config.EXPT=${EXPT} \
   config.SUBEXPT=${EXPT}_ghrsst \
   forecast.output_history=.true. \
   ../parm/hafs_regional_static.conf \
   ../parm/hafs_docn.conf \
   ../parm/hafs_docn_ghrsst.conf

The cycle (e.g., 2019082900) and storm (e.g., 00L) can be modified. The final two files in each command configure the CDEPS data models (see Appendix A: HAFS-CDEPS Configuration Options). It is probably not necessary to change the configuration unless you want to customize the experiment.

Before submitting the cron script, remember to create the ./parm/system.conf file and link the fix files using ./sorc/link_fix.sh, which is called from install_hafs.sh when building the application (see Section 2.1.2).

After the above steps are complete, submit the cron script (cronjob_hafs.sh) repeatedly until the workflow completes, or add the script to your crontab. See Figure 3.1: DATM Workflow and Figure 3.2: DOCN Workflow for the steps that will be executed for a simple workflow without vortex initialization or data assimilation. (Note that vortex initialization and data assimilation options are supported for DOCN, but the workflow is more complex).

Fig. 3.1 *Schematic diagram of the HAFS-CDEPS workflow for DATM. Blue text indicates the jobs that will run. Gray text indicates jobs that only run when data models are not used.*

Fig. 3.2 *Schematic diagram of the HAFS-CDEPS workflow for DOCN. Blue text indicates the jobs that will run. Gray text indicates jobs that only run when data models are not used.*

3.2.7. Limitations and Other Considerations

HAFS-CDEPS can only be used in the HAFS regional configuration, since the ocean coupling for the global-nesting configuration is still being developed at the time of this project. In addition, the CDEPS DATM and DOCN are mutually exclusive and cannot be run at the same time in HAFS. Finally, the only fully-supported datasets are ERA5 for DATM and OISST and GHRSST for DOCN. Some tips for adding a custom dataset are discussed in Appendix B: Considerations for Adding a New Dataset.

3.2.8. For More Information

The official documentation for CDEPS is available from https://escomp.github.io/CDEPS/index.html.

3.2.9. Appendix A: HAFS-CDEPS Configuration Options

The following table describes variables that are relevant to the HAFS-CDEPS configuration, along with some recommendations for setting them. The recommended settings have already been applied in the various configuration files.

Table 3.2 HAFS-CDEPS Configuration Options
Section	Variable name	Description	Valid Values
[config]
	run_datm	Whether to run data atmosphere (DATM)	`yes` \| `no`
	run_docn	Whether to run data ocean (DOCN)	`yes` \| `no`
	run_ocean	Whether to run the active ocean model. Must be no if `run_docn=yes`	`yes` \| `no`
	run_dwav	Whether to run data wave (DWAV)	not yet implemented
	make_mesh_atm	Whether the workflow should generate a mesh file that describes the grid for DATM. Unless the user is providing a custom mesh file this should be set to yes. No effect if `run_datm=no`.	`yes` \| `no`
	mesh_atm_in	The location of the premade DATM mesh file. Only used if `run_datm=yes` and `make_mesh_atm=no`	file path
	make_mesh_ocn	Whether the workflow should generate a mesh file that describes the grid for DOCN. Unless the user is providing a custom mesh file this should be set to yes. No effect if `run_docn=no`.	`yes` \| `no`
	mesh_ocn_in	The location of the premade DOCN mesh file. Only used if `run_docn=yes` and `make_mesh_ocn=no`.	file path
	datm_source	The data source used for DATM. Only ERA5 is supported. No effect if `run_datm=no`.	`era5`
	DATMdir	The location where DATM input data are staged by the user. This variable is set in `system.conf.[machine]`. The workflow will not download new data if the necessary input files are already present in `DATMdir`.	file path
	docn_source	The data source used for DOCN. Only OISST and GHRSST are supported. No effect if `run_docn=no`.	`oisst` \| `ghrsst`
	DOCNdir	The location where DOCN input data are staged by the user. This variable is set in `system.conf.[machine]`. The workflow will not download new data if the necessary input files are already present in `DOCNdir`.	file path
	scrub_com	Whether to scrub the cycle’s `com` directory at the end of the run. Recommend setting to no to avoid losing files generated by CDEPS that the archive job does not save.	`yes` \| `no`
	scrub_work	Whether to scrub the cycle’s `work` directory at the end of the run. Recommend setting to no to avoid losing files generated by CDEPS that the archive job does not save.	`yes` \| `no`
	run_vortexinit	Whether to run the vortex initialization. Must be no if `run_datm=yes`.	`yes` \| `no`
	run_gsi_vr	Whether to run the GSI-based vortex relocation. Must be no if `run_datm=yes`.	`yes` \| `no`
	run_gsi_vr_fgat	Whether to run the GSI-based vortex relocation with FGAT. Must be no if `run_datm=yes`.	`yes` \| `no`
	run_gsi_vr_ens	Whether to run the GSI-based vortex relocation for each HAFS ensemble member. Must be no if `run_datm=yes`.	`yes` \| `no`
	run_gsi	Whether to run data assimilation with GSI. Must be no if `run_datm=yes`.	`yes` \| `no`
	run_fgat	Whether to run data assimilation using FGAT. Must be no if `run_datm=yes`.	`yes` \| `no`
	run_envar	Whether to run hybrid EnVar data assimilation. Must be no if `run_datm=yes`.	`yes` \| `no`
	run_ensda	Whether to run the HAFS ensemble. Must be no if `run_datm=yes`.	`yes` \| `no`
	run_enkf	Whether to run the EnKF analysis step. Must be no if `run_datm=yes`.	`yes` \| `no`
[forecast]
	layoutx	Processor decomposition in the x-direction.
	layouty	Processor decomposition in the y-direction.
	write_groups	Number of processor groups for I/O.	integer >= 1
	write_tasks_per_group	Number of cores per I/O group.	integer >= 1
	ocean_tasks	Number of cores for the ocean model.	integer >= 1
	docn_mesh_nx_global	DOCN domain size in the x-direction.	integer >= 1
	docn_mesh_ny_global	DOCN domain size in the y-direction.	integer >= 1
[rocotostr]
	FORECAST_RESOURCES	String that describes the forecast resources. It must match an entry in the file for your platform in `./rocoto/sites/`.

3.2.10. Appendix B: Considerations for Adding a New Dataset

While it is impossible to formally support every dataset in HAFS-CDEPS, developers who wish to use a dataset of their own choosing are encouraged to follow these steps:

To prepare a data atmosphere experiment from a custom dataset, consider running DATM with ERA5 first so that you have a reference. Likewise, if preparing a data ocean experiment, run DOCN with either OISST or GHRSST data first.
You may wish to write your own script (or modify the existing scripts) to download the dataset of interest. See the three ush/cdeps_utils/hafs_*_download.py scripts mentioned in Section 3.2.4. You should also set DATMdir or DOCNdir in ./parm/system.conf to the location of your staged data.
The input data you provide must be in netCDF format, and the time axis in the file(s) must be CF-1.0 compliant.
You will probably need to modify scripts/exhafs_datm_prep.sh or scripts/exhafs_docn_prep.sh to add a new data source and corresponding script to the workflow to preprocess your data files. Alternatively, if you have already preprocessed your data outside of the workflow and simply need to copy the data to the working directory, you can simply modify an existing if statement in the script. For example, for a DOCN run:
```
if [[ "$docn_source" == OISST ]] ; then
$USHhafs/produtil_deliver.py -c "$DOCNdir/my_dataset.nc" "$docn_input_path/DOCN_input_00000.nc"
```
where my_dataset.nc is your input dataset. This command will copy your input data file from DOCNdir to the correct working directory during the ocn_prep job.
The mapping between the variable names in your dataset and the names used internally by CDEPS is described by the stream_data_variables keys in ./parm/cdeps/datm_era5.streams (DATM) and ./parm/cdeps/docn_oisst.streams and ./parm/cdeps/docn_ghrsst.streams (DOCN). You should make the first entry in each pair of variable names correspond to the name of the variable in your dataset.

For a run that couples DATM to HYCOM, the variables that must be present in your input dataset (along with the expected units) are as follows:

Table 3.3 Required Input Variable(s) for DATM to HYCOM
CDEPS DATM variable	Unit	Description
Sa_pslv	Pa	Mean sea-level pressure
Faxa_rain	m	Liquid-equivalent total precipitation
Faxa_swnet	J m²	Surface net downward shortwave flux
Faxa_lwnet	J m²	Surface net upwelling longwave flux
Faxa_sen	J m²	Surface upward sensible heat flux
Faxa_lat	J m²	Surface upward latent heat flux
Faxa_taux	N m² s	Surface zonal-component turbulent stress
Faxa_tauy	N m² s	Surface meridional-component turbulent stress

For a run that couples DOCN to the UFS Weather Model, the only variable that must be present in your input dataset (along with the expected unit) is as follows:

Table 3.4 Required Input Variable(s) for DOCN to UFS Weather Model
CDEPS DOCN variable	Unit	Description
So_t	℃	Sea-surface temperature

In addition to preparing the input data, you will also need to create a mesh file that describes the input data grid. It should be possible to leverage the existing ./ush/cdeps_utils/hafs_esmf_mesh.py script for this purpose, but it has only been tested with ERA5 (DATM) and OISST and GHRSST (DOCN) data. Tri-polar grids, such as those used in the Real-Time Ocean Forecast System (RTOFS) dataset, may require modifications to hafs_esmf_mesh.py. If you generate your own mesh, you should set make_mesh_atm or make_mesh_ocn to no and provide the path to the mesh using mesh_atm_in or mesh_ocn_in (see Appendix A: HAFS-CDEPS Configuration Options).

Footnotes