3.2. Using CDEPS in HAFS
Attention
This capability is not currently being developed, maintained, or tested, but the information in this chapter is provided as a starting point for interested developers.
3.2.1. Introduction
The Community Data Models for Earth Prediction Systems (CDEPS) allows users of coupled Earth system models to reduce the amount of feedback from active model components by replacing one or more active model components with canned datasets. These datasets can be generated from a variety of sources, including prior model runs, reanalyses, and gridded data constructed from observations. For example, users might initialize and force a coupled atmosphere-ocean model (e.g., HYCOM) with the 5th Generation European Centre for Medium-Range Weather Forecasts Reanalysis [1] (ERA5), instead of using the UFS Weather Model itself.
The CDEPS implementation in HAFS currently supports data atmosphere (DATM) and data ocean (DOCN) models. The workflow supports the ERA5 dataset for DATM and the Optimally-Interpolated Sea-Surface Temperature v2.1 [2] (OISST) and Group for High-Resolution Sea-Surface Temperature [3] (GHRSST) datasets for DOCN. Before running the workflow, the user must stage the required datasets on disk. Download scripts are provided for the supported datasets. Advice for adding support for additional datasets is provided in Appendix B.
CDEPS has been added to HAFS under the Improve Workflow Usability, Portability, and Testing Capabilities Project being executed by the National Center for Atmospheric Research (NCAR) Climate and Global Dynamics Laboratory (CGD) and the University of Colorado/Cooperative Institute for Research in Environmental Sciences (CU/CIRES). These efforts are funded under the Infrastructure portfolio of the Fiscal Year 2018 Disaster Related Appropriation Supplemental (DRAS), commonly referred to as the Hurricane Supplemental (HSUP).
3.2.2. Obtaining the Code
The HAFS-CDEPS capability is contained in the develop
branch of the HAFS repository.
To obtain the code, run:
git clone --recursive -b develop https://github.com/hafs-community/HAFS
3.2.3. Data Access
Before users can perform a DATM run with ERA5, they must create a Climate Data Store account (https://cds.climate.copernicus.eu) and digitally sign the ERA5 license agreement. Users will be assigned a key, which should be added to a file called .cdsapirc
in their home directory on the machine(s) they plan to use. The process is described in more detail at https://cds.climate.copernicus.eu/api-how-to.
There are no prerequisites to downloading supported datasets for DOCN.
3.2.4. Data Download
Before running the workflow, the user must download the necessary input data. Three scripts are provided for this purpose in the ush/cdeps_utils/
directory:
hafs_era5_download.py
hafs_oisst_download.py
hafs_ghrsst_download.py
The scripts must be run with Python 3.6 or higher. The required Python packages are listed at the top of each script. The hafs_era5_download.py
script also requires the Climate Data Operators (CDO) module to be loaded beforehand.
The scripts can be run in the following manner:
./hafs_<dataset>_download.py [options] day [day [...]]
where <dataset>
is one of the options in the scripts listed above. Day can be specified in any of the following ways:
20210815
: Specify one day (e.g., August 15, 2021)20210815-20210819
: Specify a range of days (e.g., August 15-19, 2021)2018
: Specify an entire year (e.g., 2018)
Data must be downloaded for the entire length of the forecast, plus one day before and one day after. For example, a user running a 126-hr forecast initialized at 2021081518 with ERA5 data should run the download script like this:
./hafs_era5_download.py 20210814-20210822
After downloading the data, specify its location using DATMdir
or DOCNdir
in parm/system.conf
.
3.2.5. Building CDEPS in HAFS
The DAPP keyword in the call to ./compile.sh
in ./sorc/build_forecast.sh
should be set to -DAPP=HAFS-ALL
to build HAFS with support for data models. The resulting executable can also be used for HAFS runs with active atmosphere and ocean models.
By default, the DAPP keyword should already be set to HAFS-ALL on all supported machines except wcoss_cray.
The remainder of the build process is the same as described in the HAFS Quick Start Guide.
3.2.6. Using CDEPS in the HAFS Workflow
The HAFS workflow can be used to run data model experiments with minimal modifications, which are described below.
Modify the ./rocoto/cronjob_hafs_cdeps.sh
script:
Uncomment the definitions of
HOMEhafs
,dev
, andPYTHON3
appropriate for the HPC platform that you are using.Set
HOMEhafs
to the top-level directory that contains the HAFS scripts and source codes.Near the bottom of the script, review the commands for the three DATM and DOCN experiments, and comment out the commands for any experiments that you do not want to run:
To run the DATM with ERA5, the command is:
${PYTHON3} ./run_hafs.py -t ${dev} 2019082900 00L HISTORY \ config.EXPT=${EXPT} \ config.SUBEXPT=${EXPT}_era5 \ forecast.output_history=.true. \ ../parm/hafs_regional_static.conf \ ../parm/hafs_hycom.conf \ ../parm/hafs_datm.conf \ ../parm/hafs_datm_era5.conf
To run the DOCN with OISST, the command is:
${PYTHON3} ./run_hafs.py -t ${dev} 2019082900 00L HISTORY \ config.EXPT=${EXPT} \ config.SUBEXPT=${EXPT}_oisst \ forecast.output_history=.true. \ ../parm/hafs_regional_static.conf \ ../parm/hafs_docn.conf \ ../parm/hafs_docn_oisst.conf
To run the DOCN with GHRSST, the command is:
${PYTHON3} ./run_hafs.py -t ${dev} 2019082900 00L HISTORY \ config.EXPT=${EXPT} \ config.SUBEXPT=${EXPT}_ghrsst \ forecast.output_history=.true. \ ../parm/hafs_regional_static.conf \ ../parm/hafs_docn.conf \ ../parm/hafs_docn_ghrsst.conf
The cycle (e.g., 2019082900) and storm (e.g., 00L) can be modified. The final two files in each command configure the CDEPS data models (see Appendix A: HAFS-CDEPS Configuration Options). It is probably not necessary to change the configuration unless you want to customize the experiment.
Before submitting the cron script, remember to create the ./parm/system.conf
file and link the fix files using ./sorc/link_fix.sh
, which is called from install_hafs.sh
when building the application (see Section 2.1.2).
After the above steps are complete, submit the cron script (cronjob_hafs.sh
) repeatedly until the workflow completes, or add the script to your crontab. See Figure 3.1: DATM Workflow and Figure 3.2: DOCN Workflow for the steps that will be executed for a simple workflow without vortex initialization or data assimilation. (Note that vortex initialization and data assimilation options are supported for DOCN, but the workflow is more complex).

Fig. 3.1 Schematic diagram of the HAFS-CDEPS workflow for DATM. Blue text indicates the jobs that will run. Gray text indicates jobs that only run when data models are not used.

Fig. 3.2 Schematic diagram of the HAFS-CDEPS workflow for DOCN. Blue text indicates the jobs that will run. Gray text indicates jobs that only run when data models are not used.
3.2.7. Limitations and Other Considerations
HAFS-CDEPS can only be used in the HAFS regional configuration, since the ocean coupling for the global-nesting configuration is still being developed at the time of this project. In addition, the CDEPS DATM and DOCN are mutually exclusive and cannot be run at the same time in HAFS. Finally, the only fully-supported datasets are ERA5 for DATM and OISST and GHRSST for DOCN. Some tips for adding a custom dataset are discussed in Appendix B: Considerations for Adding a New Dataset.
3.2.8. For More Information
The official documentation for CDEPS is available from https://escomp.github.io/CDEPS/index.html.
3.2.9. Appendix A: HAFS-CDEPS Configuration Options
The following table describes variables that are relevant to the HAFS-CDEPS configuration, along with some recommendations for setting them. The recommended settings have already been applied in the various configuration files.
Section |
Variable name |
Description |
Valid Values |
---|---|---|---|
[config] |
|||
run_datm |
Whether to run data atmosphere (DATM) |
|
|
run_docn |
Whether to run data ocean (DOCN) |
|
|
run_ocean |
Whether to run the active ocean model. Must be no if |
|
|
run_dwav |
Whether to run data wave (DWAV) |
not yet implemented |
|
make_mesh_atm |
Whether the workflow should generate a mesh file that describes the grid for DATM. Unless the user is providing a custom mesh file this should be set to yes. No effect if |
|
|
mesh_atm_in |
The location of the premade DATM mesh file. Only used if |
file path |
|
make_mesh_ocn |
Whether the workflow should generate a mesh file that describes the grid for DOCN. Unless the user is providing a custom mesh file this should be set to yes. No effect if |
|
|
mesh_ocn_in |
The location of the premade DOCN mesh file. Only used if |
file path |
|
datm_source |
The data source used for DATM. Only ERA5 is supported. No effect if |
|
|
DATMdir |
The location where DATM input data are staged by the user. This variable is set in |
file path |
|
docn_source |
The data source used for DOCN. Only OISST and GHRSST are supported. No effect if |
|
|
DOCNdir |
The location where DOCN input data are staged by the user. This variable is set in |
file path |
|
scrub_com |
Whether to scrub the cycle’s |
|
|
scrub_work |
Whether to scrub the cycle’s |
|
|
run_vortexinit |
Whether to run the vortex initialization. Must be no if |
|
|
run_gsi_vr |
Whether to run the GSI-based vortex relocation. Must be no if |
|
|
run_gsi_vr_fgat |
Whether to run the GSI-based vortex relocation with FGAT. Must be no if |
|
|
run_gsi_vr_ens |
Whether to run the GSI-based vortex relocation for each HAFS ensemble member. Must be no if |
|
|
run_gsi |
Whether to run data assimilation with GSI. Must be no if |
|
|
run_fgat |
Whether to run data assimilation using FGAT. Must be no if |
|
|
run_envar |
Whether to run hybrid EnVar data assimilation. Must be no if |
|
|
run_ensda |
Whether to run the HAFS ensemble. Must be no if |
|
|
run_enkf |
Whether to run the EnKF analysis step. Must be no if |
|
|
[forecast] |
|||
layoutx |
Processor decomposition in the x-direction. |
||
layouty |
Processor decomposition in the y-direction. |
||
write_groups |
Number of processor groups for I/O. |
integer >= 1 |
|
write_tasks_per_group |
Number of cores per I/O group. |
integer >= 1 |
|
ocean_tasks |
Number of cores for the ocean model. |
integer >= 1 |
|
docn_mesh_nx_global |
DOCN domain size in the x-direction. |
integer >= 1 |
|
docn_mesh_ny_global |
DOCN domain size in the y-direction. |
integer >= 1 |
|
[rocotostr] |
|||
FORECAST_RESOURCES |
String that describes the forecast resources. It must match an entry in the file for your platform in |
3.2.10. Appendix B: Considerations for Adding a New Dataset
While it is impossible to formally support every dataset in HAFS-CDEPS, developers who wish to use a dataset of their own choosing are encouraged to follow these steps:
To prepare a data atmosphere experiment from a custom dataset, consider running DATM with ERA5 first so that you have a reference. Likewise, if preparing a data ocean experiment, run DOCN with either OISST or GHRSST data first.
You may wish to write your own script (or modify the existing scripts) to download the dataset of interest. See the three
ush/cdeps_utils/hafs_*_download.py
scripts mentioned in Section 3.2.4. You should also setDATMdir
orDOCNdir
in./parm/system.conf
to the location of your staged data.The input data you provide must be in netCDF format, and the time axis in the file(s) must be CF-1.0 compliant.
You will probably need to modify
scripts/exhafs_datm_prep.sh
orscripts/exhafs_docn_prep.sh
to add a new data source and corresponding script to the workflow to preprocess your data files. Alternatively, if you have already preprocessed your data outside of the workflow and simply need to copy the data to the working directory, you can simply modify an existingif
statement in the script. For example, for a DOCN run:if [[ "$docn_source" == OISST ]] ; then $USHhafs/produtil_deliver.py -c "$DOCNdir/my_dataset.nc" "$docn_input_path/DOCN_input_00000.nc"
where
my_dataset.nc
is your input dataset. This command will copy your input data file fromDOCNdir
to the correct working directory during theocn_prep
job.The mapping between the variable names in your dataset and the names used internally by CDEPS is described by the
stream_data_variables
keys in./parm/cdeps/datm_era5.streams
(DATM) and./parm/cdeps/docn_oisst.streams
and./parm/cdeps/docn_ghrsst.streams
(DOCN). You should make the first entry in each pair of variable names correspond to the name of the variable in your dataset.For a run that couples DATM to HYCOM, the variables that must be present in your input dataset (along with the expected units) are as follows:
Table 3.3 Required Input Variable(s) for DATM to HYCOM CDEPS DATM variable
Unit
Description
Sa_pslv
Pa
Mean sea-level pressure
Faxa_rain
m
Liquid-equivalent total precipitation
Faxa_swnet
J m2
Surface net downward shortwave flux
Faxa_lwnet
J m2
Surface net upwelling longwave flux
Faxa_sen
J m2
Surface upward sensible heat flux
Faxa_lat
J m2
Surface upward latent heat flux
Faxa_taux
N m2 s
Surface zonal-component turbulent stress
Faxa_tauy
N m2 s
Surface meridional-component turbulent stress
For a run that couples DOCN to the UFS Weather Model, the only variable that must be present in your input dataset (along with the expected unit) is as follows:
Table 3.4 Required Input Variable(s) for DOCN to UFS Weather Model CDEPS DOCN variable
Unit
Description
So_t
℃
Sea-surface temperature
In addition to preparing the input data, you will also need to create a mesh file that describes the input data grid. It should be possible to leverage the existing
./ush/cdeps_utils/hafs_esmf_mesh.py
script for this purpose, but it has only been tested with ERA5 (DATM) and OISST and GHRSST (DOCN) data. Tri-polar grids, such as those used in the Real-Time Ocean Forecast System (RTOFS) dataset, may require modifications tohafs_esmf_mesh.py
. If you generate your own mesh, you should setmake_mesh_atm
ormake_mesh_ocn
to no and provide the path to the mesh usingmesh_atm_in
ormesh_ocn_in
(see Appendix A: HAFS-CDEPS Configuration Options).
Footnotes