Coastal Ocean Modeling Tools
LiveOcean Architecture Notes


LiveOcean Logo

LiveOcean Architecture Notes

Useful Links

LiveOcean is a daily forecast version of our model framework, based initially on the PNWTOX model. Its goal is to make forecasts of Ocean Acidificaion that are useful for shellfish growers. This page is for notes on system development, and is not meant to provide information to the public. For that please see the homepage instead.

Code repo

All the code is available freely through GitHub. Directions here.

Naming conventions

These are used as command line arguments, and also to name output directories

  • [gridname] specifies the grid (e.g. cascadia1)
  • [tag] specifies anything about the forcing (e.g. base)
  • [ex_name] tells which ROMS executable to use (e.g. lo1 ot lobio1)
  • [f_string] is the directory name where forcing or ROMS output for a specific day ends up (e.g. f2015.09.17).
  • These are used in a hierarchical way, allowing you to mix and match in a flexible way. So:
    • LiveOcean/preamble/make_resources/[gridname]/grid.nc is the grid file
    • LiveOcean_output/[gridname]_[tag]/[f_string]/ holds the forcing files for a given day, in folders atm, bio, ocn, riv, tide
    • LiveOcean_roms/output/[gridname]_[tag]_[ex_name]/[f_string]/ holds the model output for a given day.

System Design

The system described here does two main things: prepare forcing files for a "forecast" and run ROMS for a forecast. A forecast is identified by a day (e.g. 2015.09.17). It can also easily run "backfill" which is a sequence of day-long runs over some past time period, like a year.

The system is designed to be modular, separating specific tasks, and keeping code separate from data. Code is always kept in version control, while output often just sits somewhere, can be replicated in different places, and is likely to be overwritten as needed. We use a mix of bash shellscripts, python, matlab, and of course the ROMS f90.

Here is the basic structure, expressed as directories somewhere on a linux machine:

  • LiveOcean/ The main place where the system code lives (all available on GitHub)
    • alpha/ Central place for a few functions that are needed by many pieces of code. For example Lfun.py and Lstart.m will return dicts or structures that have aboslute path names to all the parts of the code. They are designed to automatically adapt to whatever computer you are running on. Hence if you want to implement the system on a new machine or change the system directory structure then in principle you would only have to edit things here.
    • driver/ This is where you execute the two main bash scripts that run the system. These can be run from the command line or by cron. They accept command line arguments that allow you to do essentially everything you need to make forecasts or backfill. These are described in detail in a separate section below.
      • driver_forcing.sh
      • driver_roms.sh
    • forcing/ Each folder has the code for a specific aspect of the forcing. The code in folders atm, bio (maybe), ocn, riv, and tide make required forcing files. Call these [frc]. The code in dot_in makes the liveocean.in file that ROMS uses to decide how long to run and where to look for the forcing files, etc. The code in hycom is just used for creating an archive of HYCOM ocean files that are used inthe ocn forcing. The code in azu handles pushing the ROMS output files to Azure cloud storage, where other people can get at it. Each of the forcing types has two main programs: make_forcing_main.py (typically goes out to an external source, gets forcing data like riverflow, and does some preprocessing), and make_forcing_worker.m (takes the preprocessed data and formats it in the NetCDF files that ROMS expects). The results end up in LiveOcean_output/[gridname]_[tag]/[f_string]/ = (*). These results are organized in a regular way, e.g. for [frc] forcing: (*)[frc]/ holds the .nc forcing files, (*)[frc]/Data/ holds the preprocessed data, if any, and (*)[frc]/Info/ holds the text files process_status.csv and screen_out.txt which have information about when each process ran, what time span the variables covered, etc.
      • atm/
      • azu/
      • bio/
      • dot_in/[gridname]/
      • hycom/
      • low_pass/
      • ocn/
      • riv/ or riv1/
      • tide/
    • plotting/ Python plotting code (output goes to LiveOcean_output/plots/).
    • shared/
      • mexcdf/ MATLAB NetCDF toolbox
      • seawater/ MATLAB seawater routines
      • Z_functions/ MATLAB home-grown utility functions, e.g. for plotting or working with ROMS history files
  • LiveOcean_data/ Files with binary data that are unlikely to change, or that I want to keep out of version control for other reasons.
    • accounts/
    • coast/ Coastline data for plotting.
    • grids/ Grid files, see below for details.
    • hycom_combined/ Preprocessed HYCOM files for backfill.
    • hycom[90.0, 91.0, 91.1]/
    • rivers/
    • tide/ TPXO data
    • tracks/ Mat files of Neil's tracks for plotting.
  • LiveOcean_output/ The model output. Generally a place for big piles of binary data that often change.
    • [gridname]_[tag]/
    • plots/
  • LiveOcean_roms/ See the ROMS page

Grids

The code in ptools/pgrid/ will do everything:

  • Create the directory (Grids) = LiveOcean_data/grids/[gridname]/ and add:
    • grid.nc
    • river_info.csv (see rivers page)
    • S_COORDINATE_INFO.csv
    • S.mat

Driver scripts

These are the multi-purpose tools that I use most to keep the forecast going when something breaks, or to do backfill runs. For example, when the forecast breaks because HYCOM was not available, you would use driver_forcing.sh from the fjord command line to try to make the ocn forcing again. Then you would use driver_roms.sh from the gaggle command line to do the run. These scripts and the programs they call automate a bunch of things that would be difficult to do correctly by hand, for example creating the .in file for a given day and then calling ROMS with the right set of cores. When they are run using cron they just do today's forecast, and cron handles doing things in the right order and at times that are likely to succeed. when you run from the command line you can do a series of days (even a whole year or longer). In this case each process keeps looking for some evidence that a day has finished before starting the next one.

driver_forcing.sh (run on fjord)

# USE COMMAND LINE OPTIONS
#
# -g name of the grid [cascadia1, ...]
# -t name of the forcing tag [base, ...]
# -x name of the ROMS executable to use (only needed if forcing is "azu") [lo1, ...]
# -f forcing type [atm, ocn, riv, tide, azu]
# -r run type [forecast, backfill]
# if backfill, then you must provide two more arguments
# -0 start date: yyyymmdd
# -1 end date: yyyymmdd
#
# example call to do backfill:
# ./driver_forcing.sh -g cascadia1 -t base -f atm -r backfill -0 20140201 -1 20140203
#
# example call to do forecast:
# ./driver_forcing.sh -g cascadia1 -t base -f atm -r forecast
#
# example call push to azure:
# ./driver_forcing.sh -g cascadia1 -t base -x lo1 -f azu -r backfill -0 20140201 -1 20140203
#
# you can also use long names like --ex_name instead of -x
#
# note that the --ex_name (or -x) parameter is only needed when calling azu.

driver_roms.sh or driver_roms1.sh (run on gaggle)

# USE COMMAND LINE OPTIONS
#
# -g name of the grid [cascadia1, ...]
# -t name of the forcing tag [base, ...]
# -x name of the ROMS executable to use
# -s new or continuation
# -r forecast or backfill
# if backfill, then you must provide two more arguments
# -0 start date: yyyymmdd
# -1 end date: yyyymmdd
#
# example call to do backfill:
# ./driver_roms.sh -g cascadia1 -t base -x lo1 -s new -r backfill -0 20140201 -1 20140203
#
# example call to do forecast:
# ./driver_roms.sh -g cascadia1 -t base -x lo1 -s continuation -r forecast
#
# you can also use long names like --ex_name instead of -x

Cron

The different parts of the system are run as cron jobs, which you edit using crontab -e, and you can look at what jobs are running using crontab -l. Here is some crontab info. Below are examples of what is might be currently running on fjord and gaggle:

fjord (the linux box we use for making forcing, and for later analysis)

  • 30 12 * * * /data1/parker/LiveOcean/driver/driver_forcing.sh -g cascadia1 -t base -x lo1 -f azu -r forecast > /data1/parker/LiveOcean/driver/dlog_azu
  • 00 10 * * * /data1/parker/LiveOcean/driver/driver_forcing.sh -g cascadia1 -t base -f atm -r forecast > /data1/parker/LiveOcean/driver/dlog_atm
  • 50 09 * * * /data1/parker/LiveOcean/driver/driver_forcing.sh -g cascadia1 -t base -f tide -r forecast > /data1/parker/LiveOcean/driver/dlog_tide
  • 10 09 * * * /data1/parker/LiveOcean/driver/driver_forcing.sh -g cascadia1 -t base -f ocn -r forecast > /data1/parker/LiveOcean/driver/dlog_ocn
  • 00 09 * * * /data1/parker/LiveOcean/driver/driver_forcing.sh -g cascadia1 -t base -f riv -r forecast > /data1/parker/LiveOcean/driver/dlog_riv

gaggle (the big linux cluster that runs ROMS - I have 144 cores)

  • 30 10 * * * /fjdata1/parker/LiveOcean/driver/driver_roms.sh -g cascadia1 -t base -x lo1 -s continuation -r forecast > /fjdata1/parker/LiveOcean/driver/dlog_roms

Small Notes

  • ROMS output for a given forecast day is stored in hourly history files, named e.g. ocean_his_[0002-0025].nc. This corresponds to hours relative to the start of the day of [f_string] (time is also stored inside each history file, or course). 0002 would be 1 AM UTC, and 0025 would be midnight at the end of the day. ROMS output numbering starts at 1. At the very start of a run we also have file 0001. Files made as part of a forecast typicall are three days long, so they have [0002-0073], while for backfill we save time and storage by just going to 0025. Hence for a sequence of [f_strong] folders made as forecasts, there is about overlap in time, and to make a continuous record you would just string together data from files [0002-0025] from a series of days. The overlapping parts between days are not identical because they rely on different forcing and initial conditions.
  • We use "seconds since 1/1/1970, UTC" as the official time for variables in the ROMS NetCDF forcing files, and the ROMS output.
  • We are using 2013 for the hindcast test year, because it has ample carbon data.
  • The raw WRF files are being loaded into /pmraid3/darr/tstwrf/tmpwrf starting with 2014050100 (150 MB, out to 20 hours). We began getting forecasts out to 84 hours (503 MB) - at least for the 00 forecast - as of 2014121900. The 12 forecast was lost starting 2014111312. There are also month folders that David made, starting with wrf_11_2013 in the same place. I believe he has older files elsewhere.
  • On fjord.ocean.washington.edu I (PM) have installed my own Enthought Canopy python system.
  • To get to python use: /home/parker/Enthought/Canopy_64bit/User/bin/python
  • The RAID disk called /data1 for fjord is mounted as /fjdata1 when gaggle uses it.
  • Currently most of the code is on fjord in /data1/parker/LiveOcean...
  • But the ROMS code is on gaggle in /pmr1/parker/LiveOcean_roms...

 


Parker MacCready 08/15/2016