TestingExecutorBlanca
Instructions for installing and configuring a WMT executor on blanca.
--Mpiper (talk) 15:44, 28 February 2018 (MST)
Set build environment on blanca
Login to summit.
ssh mapi8461@login.rc.colorado.edu
Get the corrent slurm for blanca.
module load slurm/blanca
Login to a compute node.
sinteractive
Build everything on the compute node.
Set install directory
The install directory for this executor is /work/csdms/wmt/_testing.
install_dir=/work/csdms/wmt/_testing mkdir -p $install_dir
Make sure read and execute bits are set on this directory.
chmod 0775 $install_dir
Install Python
Install a Python distribution to be used locally by WMT. We like to use Miniconda.
cd $install_dir curl https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh -o miniconda.sh bash ./miniconda.sh -f -b -p $(pwd)/conda export PATH=$(pwd)/conda/bin:$PATH
If working with an existing Miniconda install, be sure to update everything before continuing:
conda update conda conda update --all
Install the CSDMS software stack
Using the csdms-stack conda channel (the Bakery) install the CSDMS software stack, including several pre-built components, with the `csdms-stack` metapackage.
conda install csdms-stack -c csdms-stack -c defaults -c conda-forge
This metapackage currently includes
- pymt
- cca-tools
- csdms-child
- csdms-sedflux-3d
- csdms-hydrotrend
- csdms-permamodel-ku
- csdms-permamodel-frostnumber
- csdms-permamodel-kugeo
- csdms-permamodel-frostnumbergeo
- csdms-cruaktemp
- csdms-brake
- csdms-pydeltarcm
Alternately, use the requirements file listed here.
Optionally install the `babelizer`, in case a component needs to be built from source.
conda install -c csdms-stack babelizer
Optionally install IPython for testing.
conda install ipython
Recall that when running IPython remotely, it's helpful to set
export MPLBACKEND=Agg
HDF5 and file locks
When testing the executor, I found that it couldn't write output to NetCDF4 files, with this scary-looking message was written to stdout:
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 47342242057600: #000: H5F.c line 491 in H5Fcreate(): unable to create file major: File accessibilty minor: Unable to open file #001: H5Fint.c line 1305 in H5F_open(): unable to lock the file major: File accessibilty minor: Unable to open file #002: H5FD.c line 1839 in H5FD_lock(): driver lock request failed major: Virtual File Layer minor: Can't update object #003: H5FDsec2.c line 940 in H5FD_sec2_lock(): unable to lock file, errno = 37, error message = 'No locks available' major: File accessibilty minor: Bad file ID accessed
On further inspection, I found that I could import the `netCDF4` Python package, but calling `Dataset` threw an exception.
When googling `H5F_open(): unable to lock the file`, I found a pair of offhand references (here and here) by an HDF5 developer to issues with file locks on Lustre filesystems in the newly released HDF5 version 1.10. Interestingly, one report came from a janus user.
I thought that by rolling back the HDF5 version to 1.8.x, I may be able to work around this issue. However, `esmpy` depends on HDF5, so I couldn't do it directly. I found that rolling back the `netcdf-fortran` package by one build did the trick:
$ conda install netcdf-fortran=4.4.4=5 -c defaults -c conda-forge <snip> The following packages will be DOWNGRADED: esmf: 7.0.0-9 conda-forge --> 7.0.0-8 conda-forge hdf5: 1.10.1-h9caa474_1 --> 1.8.18-h6792536_1 netcdf-fortran: 4.4.4-6 conda-forge --> 4.4.4-5 conda-forge
WMT can now write output to NetCDF4 on blanca.
Still true. --Mpiper (talk) 10:06, 7 September 2018 (MDT)
Install executor software
Load blanca's `git` module.
module load git
Install the `wmt-exe` package from source.
mkdir -p $install_dir/opt && cd $install_dir/opt git clone https://github.com/csdms/wmt-exe cd wmt-exe python setup.py develop
Create a site configuration file that describes the executor and symlink it to the executor's etc/ diectory.
work_dir="/rc_scratch/$USER/wmt/_testing" python setup.py configure --wmt-prefix=$install_dir --launch-dir=$work_dir --exec-dir=$work_dir #ln -s "$(realpath wmt.cfg)" $install_dir/conda/etc # "realpath" not installed on blanca :( cd $install_dir/conda/etc ln -s $install_dir/opt/wmt-exe/wmt.cfg
Check that `$USER` didn't get expanded in the file. Lines 10-11 should be:
exec_dir = /rc_scratch/$USER/wmt/_testing launch_dir = /rc_scratch/$USER/wmt/_testing
Note that we're using /rc_scratch for the launch and execution directories instead of the default ~/.wmt. Also note that we needed an SbatchLauncher class for wmt-exe because blanca uses Slurm instead of Torque for job control.
Install and test CSDMS components
Each section below describes how to install and test a particular CSDMS component.
Currently installed components:
- BRaKE
- Child
- CRUAKTemp
- FrostNumberGeoModel
- FrostNumberModel
- Hydrotrend
- KuGeoModel
- KuModel
- PyDeltaRCM
- Sedflux3D