Instructions for installing and configuring a WMT executor on blanca.
Set build environment on blanca
Login to summit.
Get the corrent slurm for blanca.
module load slurm/blanca
Login to a compute node.
Build everything on the compute node.
Set install directory
The install directory for this executor is /work/csdms/wmt/_testing.
install_dir=/work/csdms/wmt/_testing mkdir -p $install_dir
Make sure read and execute bits are set on this directory.
chmod 0775 $install_dir
Install a Python distribution to be used locally by WMT. We like to use Miniconda.
cd $install_dir curl https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh -o miniconda.sh bash ./miniconda.sh -f -b -p $(pwd)/conda export PATH=$(pwd)/conda/bin:$PATH
If working with an existing Miniconda install, be sure to update everything before continuing:
conda update conda conda update --all
Install the CSDMS software stack
Using the csdms-stack conda channel (the Bakery) install the CSDMS software stack, including several pre-built components, with the `csdms-stack` metapackage.
conda install csdms-stack -c csdms-stack -c defaults -c conda-forge
This metapackage currently includes
Alternately, use the requirements file listed here.
Optionally install the `babelizer`, in case a component needs to be built from source.
conda install -c csdms-stack babelizer
Optionally install IPython for testing.
conda install ipython
Recall that when running IPython remotely, it's helpful to set
HDF5 and file locks
When testing the executor, I found that it couldn't write output to NetCDF4 files, with this scary-looking message was written to stdout:
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 47342242057600: #000: H5F.c line 491 in H5Fcreate(): unable to create file major: File accessibilty minor: Unable to open file #001: H5Fint.c line 1305 in H5F_open(): unable to lock the file major: File accessibilty minor: Unable to open file #002: H5FD.c line 1839 in H5FD_lock(): driver lock request failed major: Virtual File Layer minor: Can't update object #003: H5FDsec2.c line 940 in H5FD_sec2_lock(): unable to lock file, errno = 37, error message = 'No locks available' major: File accessibilty minor: Bad file ID accessed
On further inspection, I found that I could import the `netCDF4` Python package, but calling `Dataset` threw an exception.
When googling `H5F_open(): unable to lock the file`, I found a offhand reference (here) by an HDF5 developer to issues with file locks on Lustre filesystems in the newly released HDF5 version 1.10. Interestingly, the report came from a janus user.
I thought that by rolling back the HDF5 version to 1.8.x, I may be able to work around this issue. However, `esmpy` depends on HDF5, so I couldn't do it directly. I found that rolling back the `netcdf-fortran` package by one build did the trick:
$ conda install netcdf-fortran=4.4.4=5 -c defaults -c conda-forge <snip> The following packages will be DOWNGRADED: esmf: 7.0.0-9 conda-forge --> 7.0.0-8 conda-forge hdf5: 1.10.1-h9caa474_1 --> 1.8.18-h6792536_1 netcdf-fortran: 4.4.4-6 conda-forge --> 4.4.4-5 conda-forge
WMT can now write output to NetCDF4 on blanca.
Install executor software
Load blanca's `git` module.
module load git
Install the `wmt-exe` package from source.
mkdir -p $install_dir/opt && cd $install_dir/opt git clone https://github.com/csdms/wmt-exe cd wmt-exe python setup.py develop
Create a site configuration file that describes the executor and symlink it to the executor's etc/ diectory.
work_dir="/rc_scratch/$USER/wmt/_testing" python setup.py configure --wmt-prefix=$install_dir --launch-dir=$work_dir --exec-dir=$work_dir #ln -s "$(realpath wmt.cfg)" $install_dir/conda/etc # "realpath" not installed on blanca :( cd $install_dir/conda/etc ln -s $install_dir/opt/wmt-exe/wmt.cfg
Check that `$USER` didn't get expanded in the file. Lines 10-11 should be:
exec_dir = /rc_scratch/$USER/wmt/_testing launch_dir = /rc_scratch/$USER/wmt/_testing
Note that we're using /rc_scratch for the launch and execution directories instead of the default ~/.wmt. Also note that we needed an SbatchLauncher class for wmt-exe because blanca uses Slurm instead of Torque for job control.
Install and test CSDMS components
Each section below describes how to install and test a particular CSDMS component.
Currently installed components: