TestingExecutorBlanca

From CSDMS
Revision as of 11:12, 12 October 2019 by WikiSysop (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
wmt-testing executor on blanca

Instructions for installing and configuring a WMT executor on blanca.

--Mpiper (talk) 15:44, 28 February 2018 (MST)

Set build environment on blanca

Login to summit.

ssh mapi8461@login.rc.colorado.edu

Get the corrent slurm for blanca.

module load slurm/blanca

Login to a compute node.

sinteractive

Build everything on the compute node.

Set install directory

The install directory for this executor is /work/csdms/wmt/_testing.

install_dir=/work/csdms/wmt/_testing
mkdir -p $install_dir

Make sure read and execute bits are set on this directory.

chmod 0775 $install_dir

Install Python

Install a Python distribution to be used locally by WMT. We like to use Miniconda.

cd $install_dir
curl https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh -o miniconda.sh
bash ./miniconda.sh -f -b -p $(pwd)/conda
export PATH=$(pwd)/conda/bin:$PATH

If working with an existing Miniconda install, be sure to update everything before continuing:

conda update conda
conda update --all

Install the CSDMS software stack

Using the csdms-stack conda channel (the Bakery) install the CSDMS software stack, including several pre-built components, with the `csdms-stack` metapackage.

conda install csdms-stack -c csdms-stack -c defaults -c conda-forge

This metapackage currently includes

  • pymt
  • cca-tools
  • csdms-child
  • csdms-sedflux-3d
  • csdms-hydrotrend
  • csdms-permamodel-ku
  • csdms-permamodel-frostnumber
  • csdms-permamodel-kugeo
  • csdms-permamodel-frostnumbergeo
  • csdms-cruaktemp
  • csdms-brake
  • csdms-pydeltarcm

Alternately, use the requirements file listed here.

Optionally install the `babelizer`, in case a component needs to be built from source.

conda install -c csdms-stack babelizer

Optionally install IPython for testing.

conda install ipython

Recall that when running IPython remotely, it's helpful to set

export MPLBACKEND=Agg

HDF5 and file locks

When testing the executor, I found that it couldn't write output to NetCDF4 files, with this scary-looking message was written to stdout:

HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 47342242057600:
 #000: H5F.c line 491 in H5Fcreate(): unable to create file
   major: File accessibilty
   minor: Unable to open file
 #001: H5Fint.c line 1305 in H5F_open(): unable to lock the file
   major: File accessibilty
   minor: Unable to open file
 #002: H5FD.c line 1839 in H5FD_lock(): driver lock request failed
   major: Virtual File Layer
   minor: Can't update object
 #003: H5FDsec2.c line 940 in H5FD_sec2_lock(): unable to lock file, errno = 37, error message = 'No locks available'
   major: File accessibilty
   minor: Bad file ID accessed

On further inspection, I found that I could import the `netCDF4` Python package, but calling `Dataset` threw an exception.

When googling `H5F_open(): unable to lock the file`, I found a offhand reference (here) by an HDF5 developer to issues with file locks on Lustre filesystems in the newly released HDF5 version 1.10. Interestingly, the report came from a janus user.

I thought that by rolling back the HDF5 version to 1.8.x, I may be able to work around this issue. However, `esmpy` depends on HDF5, so I couldn't do it directly. I found that rolling back the `netcdf-fortran` package by one build did the trick:

$ conda install netcdf-fortran=4.4.4=5 -c defaults -c conda-forge
<snip>
The following packages will be DOWNGRADED:

   esmf:           7.0.0-9              conda-forge --> 7.0.0-8              conda-forge
   hdf5:           1.10.1-h9caa474_1                --> 1.8.18-h6792536_1
   netcdf-fortran: 4.4.4-6              conda-forge --> 4.4.4-5              conda-forge

WMT can now write output to NetCDF4 on blanca.

Still true. --Mpiper (talk) 10:06, 7 September 2018 (MDT)

Install executor software

Load blanca's `git` module.

module load git

Install the `wmt-exe` package from source.

mkdir -p $install_dir/opt && cd $install_dir/opt
git clone https://github.com/csdms/wmt-exe
cd wmt-exe
python setup.py develop

Create a site configuration file that describes the executor and symlink it to the executor's etc/ diectory.

work_dir="/rc_scratch/$USER/wmt/_testing"
python setup.py configure --wmt-prefix=$install_dir --launch-dir=$work_dir --exec-dir=$work_dir
#ln -s "$(realpath wmt.cfg)" $install_dir/conda/etc # "realpath" not installed on blanca :(
cd $install_dir/conda/etc
ln -s $install_dir/opt/wmt-exe/wmt.cfg

Check that `$USER` didn't get expanded in the file. Lines 10-11 should be:

exec_dir = /rc_scratch/$USER/wmt/_testing
launch_dir = /rc_scratch/$USER/wmt/_testing

Note that we're using /rc_scratch for the launch and execution directories instead of the default ~/.wmt. Also note that we needed an SbatchLauncher class for wmt-exe because blanca uses Slurm instead of Torque for job control.

Install and test CSDMS components

Each section below describes how to install and test a particular CSDMS component.

Currently installed components:

  1. BRaKE
  2. Child
  3. CRUAKTemp
  4. FrostNumberGeoModel
  5. FrostNumberModel
  6. Hydrotrend
  7. KuGeoModel
  8. KuModel
  9. PyDeltaRCM
  10. Sedflux3D