TestingExecutorBlanca

From CSDMS
Revision as of 12:03, 12 March 2018 by Mpiper (talk | contribs) (HDF5 file locks on blanca)
wmt-testing executor on blanca

Instructions for installing and configuring a WMT executor on blanca.

--Mpiper (talk) 15:44, 28 February 2018 (MST)

Set install directory

The install directory for this executor is /projects/mapi8461/wmt/_testing.

install_dir=/projects/mapi8461/wmt/_testing
mkdir -p $install_dir

Install Python

Install a Python distribution to be used locally by WMT. We like to use Miniconda.

cd $install_dir
curl https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh -o miniconda.sh
bash ./miniconda.sh -f -b -p $(pwd)/conda
export PATH=$(pwd)/conda/bin:$PATH

If working with an existing Miniconda install, be sure to update everything before continuing:

conda update conda
conda update --all

Install the CSDMS software stack

Using the csdms-stack conda channel (the Bakery) install the CSDMS software stack, including several pre-built components, with the `csdms-stack` metapackage.

conda install csdms-stack -c csdms-stack -c defaults -c conda-forge

This metapackage currently includes

  • pymt
  • cca-tools
  • csdms-child
  • csdms-sedflux-3d
  • csdms-hydrotrend
  • csdms-permamodel-ku
  • csdms-permamodel-frostnumber
  • csdms-permamodel-kugeo
  • csdms-permamodel-frostnumbergeo
  • csdms-brake
  • csdms-pydeltarcm

Before continuing, load the `git` module.

module load git

Next, install `wmt-exe` from source.

mkdir -p $install_dir/opt && cd $install_dir/opt
git clone https://github.com/csdms/wmt-exe
cd wmt-exe
python setup.py develop

Note that I had to write a new SbatchLauncher class for wmt-exe because blanca uses Slurm instead of Torque for job control.

Optionally install the `babelizer`, in case a component needs to be built from source.

conda install -c csdms-stack babelizer

Optionally install IPython for testing.

conda install ipython

Recall that when running IPython remotely, it's helpful to set

export MPLBACKEND=Agg

HDF5 and file locks

When testing the executor, I found that it couldn't write output to NetCDF4 files, with this scary-looking message was written to stdout:

HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 47342242057600:
 #000: H5F.c line 491 in H5Fcreate(): unable to create file
   major: File accessibilty
   minor: Unable to open file
 #001: H5Fint.c line 1305 in H5F_open(): unable to lock the file
   major: File accessibilty
   minor: Unable to open file
 #002: H5FD.c line 1839 in H5FD_lock(): driver lock request failed
   major: Virtual File Layer
   minor: Can't update object
 #003: H5FDsec2.c line 940 in H5FD_sec2_lock(): unable to lock file, errno = 37, error message = 'No locks available'
   major: File accessibilty
   minor: Bad file ID accessed

On further inspection, I found that I could import the `netCDF4` Python package, but calling `Dataset` threw an exception.

When googling `H5F_open(): unable to lock the file`, I found a pair of offhand references (here and here) by an HDF5 developer to issues with file locks on Lustre filesystems in the newly released HDF5 version 1.10. Interestingly, one report came from a janus user.

I thought that by rolling back the HDF5 version to 1.8.x, I may be able to work around this issue. However, `esmpy` depends on HDF5, so I couldn't do it directly. I found that rolling back the `netcdf-fortran` package by one build did the trick:

$ conda install netcdf-fortran=4.4.4=5 -c defaults -c conda-forge
<snip>
The following packages will be DOWNGRADED:

   esmf:           7.0.0-9              conda-forge --> 7.0.0-8              conda-forge
   hdf5:           1.10.1-h9caa474_1                --> 1.8.18-h6792536_1
   netcdf-fortran: 4.4.4-6              conda-forge --> 4.4.4-5              conda-forge

WMT can now write output to NetCDF4 on blanca.