Return to Coastal Ocean Modeling Tools

ROMS Notes

These are notes about specifics of setting up and running ROMS, mainly on the UCAR computer yellowstone, but also the NERSC computer hopper, and waddle.

$ in the lines below means the Linux prompt

I organize things in 5 main directories, all assumed to be at the same level (I put them all in a directory .../roms/):

1. ROMS/ (the source code)

2. makefiles/ (where compiled executables go)

3. forcing/ (the results of rtools)

4. runs/ (the code that defines and controls a run)

5. output/ (where the history files of a given run end up, e.g. in output/T2005/, and note that I have stopped using the extra OUT/ directory as of moving to yellowstone).

1. Getting the source code, ROMS/, and editing the compiler file

First execute (from any of the machines):
$ svn checkout --username pmaccc ROMS
This creates the directory ROMS with all the source code. I am using version 624, which has the new syntax for specifying boundary conditions in the .in file. The version number can be found in ROMS/ROMS/Version.

NOTE: a helpful command for looking for things is the grep command:
$ grep -rn --exclude="*.svn*" PATTERN *
which will find the string PATTERN in everything in the current directory (*), recursively, and listing line numbers (-rn) and it won't return a bunch of hits from the .svn files.

Then you may have to do a few edits to the compiler file (after saving a copy with the suffix _ORIG):

For yellowstone use ROMS/Compilers/ and no edits are required.

For hopper you edit ROMS/Compilers/ so that:

FFLAGS := -e I -e m
FFLAGS := -fast

NETCDF_INCDIR ?= /usr/local/include
NETCDF_LIBDIR ?= /usr/local/lib
NETCDF_INCDIR ?= /opt/cray/netcdf/3.6.2/netcdf-pgi/include
NETCDF_LIBDIR ?= /opt/cray/netcdf/3.6.2/netcdf-pgi/lib

FFLAGS += -O 3,aggress

and these lines are commented out (#)
# $(SCRATCH_DIR)/mod_ncparam.o: FFLAGS += -free-form
# $(SCRATCH_DIR)/mod_strings.o: FFLAGS += -free-form
# $(SCRATCH_DIR)/analytical.o: FFLAGS += -free-form
# $(SCRATCH_DIR)/biology.o: FFLAGS += -free-form

For waddle you edit ROMS/Compilers/ so that:

NETCDF_INCDIR ?= /usr/local/include
NETCDF_LIBDIR ?= /usr/local/lib
NETCDF_INCDIR ?= /usr/local/netcdf-pgi7/include
NETCDF_LIBDIR ?= /usr/local/netcdf-pgi7/lib

FC := mpif90
FC := /usr/local/openmpi5-pgi7ib/bin/mpif90

FFLAGS += -fastsse -Mipa=fast -tp k8-64

2. Compiling the code, with files in makefiles/

Here you need three files to define a run. In this case I put them all in the directory makefiles/yellowstone/ptx_01/ (or other directories for different cppdefs flags, but note that the three files always have the same names):

This has the C-preprocessing flags. I wrote this one by copying over the file ROMS/Include/cppdefs.h and then editing it so that it reproduced the functioning of Sarah Giddings run ptx_highT40_2005, but without dye. Edit it by hand to enable different flags - like dye or diagnostics.

This creates the boundary region over which nudging to climatology is done. I rewrote it so that I could understand what it did, and I checked the results carefully. The new version makes it easy to turn on and off nudging on different edges. Note that you set the nudging timescales in this code. Edit by hand only if you want to change the nudging regions.

This is copied from ROMS/makefile and then you make a few changes, mainly telling it where to look for things. Here is the output of "diff" showing the changes. For a different run you would edit this by hand to use a different MY_HEADER_DIR.

Here it is for yellowstone (BOLD = new version):

375 /glade/p/cgd/oce/people/paulmac/roms -> diff makefiles/yellowstone/ptx_01 ROMS/makefile
< MY_HEADER_DIR ?= /glade/p/cgd/oce/people/paulmac/roms/makefiles/yellowstone/ptx_01
< USE_MPI ?= on
> USE_MPI ?=
< USE_MPIF90 ?= on
> USE_MPIF90 ?=
< USE_NETCDF4 ?= on
< FORT ?= ifort
> FORT ?= pgi
> BINDIR ?= .

Then to compile you (have to!) go to the directory ROMS and execute:

$ make -f /glade/p/cgd/oce/people/paulmac/roms/makefiles/yellowstone/ptx_01/makefile clean
$ make -f /glade/p/cgd/oce/people/paulmac/roms/makefiles/yellowstone/ptx_01/makefile

This takes 5-10 minutes, and results in the executable: roms/makefiles/yellowstone/ptx_01/oceanM

This executable can be used for many different runs, and this is the reason it is separated out into its own directory.

For hopper the instructions are similar, except in the makefile you use FORT ?= ftn

3. The forcing files, in forcing/

Sarah made these using rtools, or I make my own for the SciDAC runs forced by parts of CESM. I move them to yellowstone by going to the dirctory forcing/ and then doing something like:

$ scp -r .

which will prompt for my skua password, and then move the whole pile to roms_forcing/ptx_highT40_2_2004. This takes about a half hour per year.

4. Doing a run, in runs/

Running jobs on yellowstone (UCAR Supercomputer):

Now you operate in the directory runs/T2005/, for example, where you need to have four things:
This is the input script for ROMS, controlling all the time stepping, tiling, and many other parameters. In the new version of ROMS it is also where you control all the boundary conditions - this is to allow for 2-way nesting.

This is a text file containing commands for yellowstone queuing system.
Input script for a restart.

Same as my_script, but designed to re_start from the restart file

These four are created using a python script I wrote (on my mac) and some templates. The script creates the four files and the directory they sit in. Email me if you want it.

To start a run you just execute the command (in T2005/):

$ bsub < my_script

and to restart a run execute:

$ bsub < re_script

Useful commands on yellowstone:

$ gladequota
to see how much space you are using in your allocation

Doing a run on hopper:

To start a run you just execute the command:

$ qsub my_script

A typical "my_script" is a text file with lines like:

#PBS -q premium
#PBS -l mppwidth=576
#PBS -l walltime=12:00:00
#PBS -N my_job
#PBS -e my_job.$PBS_JOBID.err
#PBS -o my_job.$PBS_JOBID.out
aprun -n 576 $GSCRATCH/roms_make/ptx_01/oceanM $GSCRATCH/roms_runs/ptx_01a_2002/ > log

This is running on 576 cores (!), and will use 12 hours of the premiium queue.

The different queues have different priorities and allowable walltimes:

  • interactive = 00:30:00 (highest priority, for debugging)
  • premium = 12:00:00 (next highest priority, but costs more of your cpu allotment)
  • regular = 48:00:00 (regular priority, slower to get going)

For reference, the 40-level ptx run with 5 dyes takes about 1.5 days on hopper with 576 cores, and creates about 1.2 TB of history files (hourly saves).

To restart a run that stopped in the middle all you have to do is change NRREC from 0 to -1, and change the initialization file to OUT/, both changes in the .in file. This assumes you have been saving restart files.

Useful commands on hopper:

  • qsub my_script to submit a job to the queue
  • qstat -u USERNAME to find out queue information (including JOBID)
  • qdel JOBID to kill a job
  • edit .cshrc.ext in my home directory to set aliases

Doing a run on waddle:

Here is a command to running using MPI with 48 cores. Clearly you have to do this from the directory where oceanM is. The implied directroy structure is different from my hopper notes above, so beware.

$ mpirun -np 48 -machinefile hf oceanM > log1 &

...and you are off and running (and returned to the command line because of the &). Lots of useful screen output from ROMS will end up in log1.

NOTE: np is the number of cores, and must match NtileI*NtileJ from your .in file.

NOTE: hf is a text file with a list of waddle nodes David Darr has said you can use. My half of waddle has 8 cores per node and 12 nodes, numbered 0 to 11. For example to use nodes 7 to 12 the file hf would have 6 lines:


Killing a multi-core job on waddle (from David Darr):

On waddle do a "ps aux | grep mpirun" and find the PID number for the mpirun job with your username and then kill it with "kill -9 PID". This almost always works. However, for reasons I don't fully understand it doesn't work a small percentage of the time... in which case I just do the brute force approach.

To see what is happening on a specific node you can do "ssh n006" (e.g. to get to node 6) and then use top. Type "exit" to return to your main shell.

NOTE: that you also need some special lines in the .cshrc in your home directory to get mpi to work. To get there type cd ~, and then use ls -la to see hidden files. My .cshrc has:

set path = ( /opt/pgi7/linux86-64/7.0/bin $path )
setenv PGI /opt/pgi7
setenv LM_LICENSE_FILE $PGI/license.dat
setenv SVN_EDITOR nano
alias mpirun /usr/local/openmpi5-pgi7ib/bin/mpirun
setenv LD_LIBRARY_PATH /usr/local/ofed/lib64:/usr/local/openmpi5-pgi7ib/lib:/usr/local/lib
set path = ( /usr/local/openmpi5-pgi7ib/bin $path)

And that's it!

Parker MacCready 10/16/2013