TestingExecutorBlanca: Difference between revisions

From CSDMS
m (Add links for sbatch launcher)
No edit summary
 
(13 intermediate revisions by one other user not shown)
Line 6: Line 6:


--[[User:Mpiper|Mpiper]] ([[User talk:Mpiper|talk]]) 15:44, 28 February 2018 (MST)
--[[User:Mpiper|Mpiper]] ([[User talk:Mpiper|talk]]) 15:44, 28 February 2018 (MST)
==Set build environment on blanca==
Login to '''''summit'''''.
ssh mapi8461@login.rc.colorado.edu
Get the corrent slurm for '''''blanca'''''.
module load slurm/blanca
Login to a compute node.
sinteractive
Build everything on the compute node.


==Set install directory==
==Set install directory==


The install directory for this executor is '''/projects/mapi8461/wmt/_testing'''.
The install directory for this executor is '''/work/csdms/wmt/_testing'''.


  install_dir=/projects/mapi8461/wmt/_testing
  install_dir=/work/csdms/wmt/_testing
  mkdir -p $install_dir
  mkdir -p $install_dir
Make sure read and execute bits are set on this directory.
chmod 0775 $install_dir


==Install Python==
==Install Python==
Line 38: Line 58:
* csdms-permamodel-kugeo
* csdms-permamodel-kugeo
* csdms-permamodel-frostnumbergeo
* csdms-permamodel-frostnumbergeo
* csdms-cruaktemp
* csdms-brake
* csdms-brake
* csdms-pydeltarcm
* csdms-pydeltarcm


Before continuing, load the `git` module.
Alternately, use the requirements file listed [https://github.com/mdpiper/wmt-executor-requirements-files here].
 
Optionally install the `babelizer`,
in case a component needs to be built from source.
 
conda install -c csdms-stack babelizer
 
Optionally install IPython for testing.
 
conda install ipython
 
Recall that when running IPython remotely,
it's helpful to set
 
export MPLBACKEND=Agg
 
===HDF5 and file locks===
 
When testing the executor, I found that it couldn't write output to NetCDF4 files,
with this scary-looking message was written to stdout:
 
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 47342242057600:
  #000: H5F.c line 491 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1305 in H5F_open(): unable to lock the file
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 1839 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDsec2.c line 940 in H5FD_sec2_lock(): unable to lock file, errno = 37, error message = 'No locks available'
    major: File accessibilty
    minor: Bad file ID accessed
 
On further inspection, I found that  I could import the `netCDF4` Python package,
but calling `Dataset` threw an exception.
 
When googling `H5F_open(): unable to lock the file`,
I found a offhand reference
([http://hdf-forum.184993.n3.nabble.com/h5fcreate-1-10-unable-to-lock-td4028902.html here])
by an HDF5 developer to issues with file locks on Lustre filesystems in the newly released HDF5 version 1.10.
Interestingly, the report came from a '''''janus''''' user.
 
I thought that by rolling back the HDF5 version to 1.8.x, I may be able to work around this issue.
However, `esmpy` depends on HDF5, so I couldn't do it directly.
I found that rolling back the `netcdf-fortran` package by one build did the trick:
 
$ conda install netcdf-fortran=4.4.4=5 -c defaults -c conda-forge
<snip>
The following packages will be DOWNGRADED:
    esmf:          7.0.0-9              conda-forge --> 7.0.0-8              conda-forge
    hdf5:          1.10.1-h9caa474_1                --> 1.8.18-h6792536_1
    netcdf-fortran: 4.4.4-6              conda-forge --> 4.4.4-5              conda-forge
 
WMT can now write output to NetCDF4 on '''''blanca'''''.
 
Still true. --[[User:Mpiper|Mpiper]] ([[User talk:Mpiper|talk]]) 10:06, 7 September 2018 (MDT)
 
===Install executor software===
 
Load blanca's `git` module.


  module load git
  module load git


Next, install `wmt-exe` from source.
Install the `wmt-exe` package from source.


  mkdir -p $install_dir/opt && cd $install_dir/opt
  mkdir -p $install_dir/opt && cd $install_dir/opt
Line 52: Line 135:
  python setup.py develop
  python setup.py develop


Note that I had to write a new [https://github.com/csdms/wmt-exe/pull/12 SbatchLauncher class]
Create a site configuration file that describes the executor and symlink it to the executor's '''etc/''' diectory.
 
work_dir="/rc_scratch/$USER/wmt/_testing"
python setup.py configure --wmt-prefix=$install_dir --launch-dir=$work_dir --exec-dir=$work_dir
#ln -s "$(realpath wmt.cfg)" $install_dir/conda/etc # "realpath" not installed on blanca :(
cd $install_dir/conda/etc
ln -s $install_dir/opt/wmt-exe/wmt.cfg
 
Check that `$USER` didn't get expanded in the file.
Lines 10-11 should be:
 
exec_dir = /rc_scratch/$USER/wmt/_testing
launch_dir = /rc_scratch/$USER/wmt/_testing
 
Note that we're using '''/rc_scratch''' for the launch and execution directories instead of the default '''~/.wmt'''.
Also note that we needed an [https://github.com/csdms/wmt-exe/pull/12 SbatchLauncher class]
for wmt-exe because blanca uses [https://slurm.schedmd.com/overview.html Slurm]
for wmt-exe because blanca uses [https://slurm.schedmd.com/overview.html Slurm]
instead of Torque for job control.
instead of Torque for job control.


Optionally install the `babelizer`,
==Install and test CSDMS components==
in case a component needs to be built from source.
 
conda install -c csdms-stack babelizer
 
Optionally install IPython for testing.


conda install ipython
Each section below
describes how to install and test a particular CSDMS component.


Recall that when running IPython remotely,
Currently installed components:
it's helpful to set


export MPLBACKEND=Agg
# BRaKE
# Child
# CRUAKTemp
# FrostNumberGeoModel
# FrostNumberModel
# Hydrotrend
# KuGeoModel
# KuModel
# PyDeltaRCM
# Sedflux3D

Latest revision as of 12:12, 12 October 2019

wmt-testing executor on blanca

Instructions for installing and configuring a WMT executor on blanca.

--Mpiper (talk) 15:44, 28 February 2018 (MST)

Set build environment on blanca

Login to summit.

ssh mapi8461@login.rc.colorado.edu

Get the corrent slurm for blanca.

module load slurm/blanca

Login to a compute node.

sinteractive

Build everything on the compute node.

Set install directory

The install directory for this executor is /work/csdms/wmt/_testing.

install_dir=/work/csdms/wmt/_testing
mkdir -p $install_dir

Make sure read and execute bits are set on this directory.

chmod 0775 $install_dir

Install Python

Install a Python distribution to be used locally by WMT. We like to use Miniconda.

cd $install_dir
curl https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh -o miniconda.sh
bash ./miniconda.sh -f -b -p $(pwd)/conda
export PATH=$(pwd)/conda/bin:$PATH

If working with an existing Miniconda install, be sure to update everything before continuing:

conda update conda
conda update --all

Install the CSDMS software stack

Using the csdms-stack conda channel (the Bakery) install the CSDMS software stack, including several pre-built components, with the `csdms-stack` metapackage.

conda install csdms-stack -c csdms-stack -c defaults -c conda-forge

This metapackage currently includes

  • pymt
  • cca-tools
  • csdms-child
  • csdms-sedflux-3d
  • csdms-hydrotrend
  • csdms-permamodel-ku
  • csdms-permamodel-frostnumber
  • csdms-permamodel-kugeo
  • csdms-permamodel-frostnumbergeo
  • csdms-cruaktemp
  • csdms-brake
  • csdms-pydeltarcm

Alternately, use the requirements file listed here.

Optionally install the `babelizer`, in case a component needs to be built from source.

conda install -c csdms-stack babelizer

Optionally install IPython for testing.

conda install ipython

Recall that when running IPython remotely, it's helpful to set

export MPLBACKEND=Agg

HDF5 and file locks

When testing the executor, I found that it couldn't write output to NetCDF4 files, with this scary-looking message was written to stdout:

HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 47342242057600:
 #000: H5F.c line 491 in H5Fcreate(): unable to create file
   major: File accessibilty
   minor: Unable to open file
 #001: H5Fint.c line 1305 in H5F_open(): unable to lock the file
   major: File accessibilty
   minor: Unable to open file
 #002: H5FD.c line 1839 in H5FD_lock(): driver lock request failed
   major: Virtual File Layer
   minor: Can't update object
 #003: H5FDsec2.c line 940 in H5FD_sec2_lock(): unable to lock file, errno = 37, error message = 'No locks available'
   major: File accessibilty
   minor: Bad file ID accessed

On further inspection, I found that I could import the `netCDF4` Python package, but calling `Dataset` threw an exception.

When googling `H5F_open(): unable to lock the file`, I found a offhand reference (here) by an HDF5 developer to issues with file locks on Lustre filesystems in the newly released HDF5 version 1.10. Interestingly, the report came from a janus user.

I thought that by rolling back the HDF5 version to 1.8.x, I may be able to work around this issue. However, `esmpy` depends on HDF5, so I couldn't do it directly. I found that rolling back the `netcdf-fortran` package by one build did the trick:

$ conda install netcdf-fortran=4.4.4=5 -c defaults -c conda-forge
<snip>
The following packages will be DOWNGRADED:

   esmf:           7.0.0-9              conda-forge --> 7.0.0-8              conda-forge
   hdf5:           1.10.1-h9caa474_1                --> 1.8.18-h6792536_1
   netcdf-fortran: 4.4.4-6              conda-forge --> 4.4.4-5              conda-forge

WMT can now write output to NetCDF4 on blanca.

Still true. --Mpiper (talk) 10:06, 7 September 2018 (MDT)

Install executor software

Load blanca's `git` module.

module load git

Install the `wmt-exe` package from source.

mkdir -p $install_dir/opt && cd $install_dir/opt
git clone https://github.com/csdms/wmt-exe
cd wmt-exe
python setup.py develop

Create a site configuration file that describes the executor and symlink it to the executor's etc/ diectory.

work_dir="/rc_scratch/$USER/wmt/_testing"
python setup.py configure --wmt-prefix=$install_dir --launch-dir=$work_dir --exec-dir=$work_dir
#ln -s "$(realpath wmt.cfg)" $install_dir/conda/etc # "realpath" not installed on blanca :(
cd $install_dir/conda/etc
ln -s $install_dir/opt/wmt-exe/wmt.cfg

Check that `$USER` didn't get expanded in the file. Lines 10-11 should be:

exec_dir = /rc_scratch/$USER/wmt/_testing
launch_dir = /rc_scratch/$USER/wmt/_testing

Note that we're using /rc_scratch for the launch and execution directories instead of the default ~/.wmt. Also note that we needed an SbatchLauncher class for wmt-exe because blanca uses Slurm instead of Torque for job control.

Install and test CSDMS components

Each section below describes how to install and test a particular CSDMS component.

Currently installed components:

  1. BRaKE
  2. Child
  3. CRUAKTemp
  4. FrostNumberGeoModel
  5. FrostNumberModel
  6. Hydrotrend
  7. KuGeoModel
  8. KuModel
  9. PyDeltaRCM
  10. Sedflux3D