TestingExecutorBlanca: Difference between revisions
| m Use rc_scratch | No edit summary | ||
| (2 intermediate revisions by one other user not shown) | |||
| Line 61: | Line 61: | ||
| * csdms-brake | * csdms-brake | ||
| * csdms-pydeltarcm | * csdms-pydeltarcm | ||
| Alternately, use the requirements file listed [https://github.com/mdpiper/wmt-executor-requirements-files here]. | |||
| Optionally install the `babelizer`, | Optionally install the `babelizer`, | ||
| Line 99: | Line 101: | ||
| When googling `H5F_open(): unable to lock the file`, | When googling `H5F_open(): unable to lock the file`, | ||
| I found a  | I found a offhand reference | ||
| ([http://hdf-forum.184993.n3.nabble.com/h5fcreate-1-10-unable-to-lock-td4028902 | ([http://hdf-forum.184993.n3.nabble.com/h5fcreate-1-10-unable-to-lock-td4028902.html here]) | ||
| by an HDF5 developer to issues with file locks on Lustre filesystems in the newly released HDF5 version 1.10. | by an HDF5 developer to issues with file locks on Lustre filesystems in the newly released HDF5 version 1.10. | ||
| Interestingly,  | Interestingly, the report came from a '''''janus''''' user. | ||
| I thought that by rolling back the HDF5 version to 1.8.x, I may be able to work around this issue. | I thought that by rolling back the HDF5 version to 1.8.x, I may be able to work around this issue. | ||
| Line 142: | Line 143: | ||
|   ln -s $install_dir/opt/wmt-exe/wmt.cfg |   ln -s $install_dir/opt/wmt-exe/wmt.cfg | ||
| Check that `$USER` didn't get expanded in the file. | |||
| Lines 10-11 should be: | |||
|  exec_dir = /rc_scratch/$USER/wmt/_testing | |||
|  launch_dir = /rc_scratch/$USER/wmt/_testing | |||
| Note that we're using '''/rc_scratch''' for the launch and execution directories instead of the default '''~/.wmt'''. | |||
| Also note that we needed an [https://github.com/csdms/wmt-exe/pull/12 SbatchLauncher class] | Also note that we needed an [https://github.com/csdms/wmt-exe/pull/12 SbatchLauncher class] | ||
| for wmt-exe because blanca uses [https://slurm.schedmd.com/overview.html Slurm] | for wmt-exe because blanca uses [https://slurm.schedmd.com/overview.html Slurm] | ||
Latest revision as of 12:12, 12 October 2019
Instructions for installing and configuring a WMT executor on blanca.
--Mpiper (talk) 15:44, 28 February 2018 (MST)
Set build environment on blanca
Login to summit.
ssh mapi8461@login.rc.colorado.edu
Get the corrent slurm for blanca.
module load slurm/blanca
Login to a compute node.
sinteractive
Build everything on the compute node.
Set install directory
The install directory for this executor is /work/csdms/wmt/_testing.
install_dir=/work/csdms/wmt/_testing mkdir -p $install_dir
Make sure read and execute bits are set on this directory.
chmod 0775 $install_dir
Install Python
Install a Python distribution to be used locally by WMT. We like to use Miniconda.
cd $install_dir curl https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh -o miniconda.sh bash ./miniconda.sh -f -b -p $(pwd)/conda export PATH=$(pwd)/conda/bin:$PATH
If working with an existing Miniconda install, be sure to update everything before continuing:
conda update conda conda update --all
Install the CSDMS software stack
Using the csdms-stack conda channel (the Bakery) install the CSDMS software stack, including several pre-built components, with the `csdms-stack` metapackage.
conda install csdms-stack -c csdms-stack -c defaults -c conda-forge
This metapackage currently includes
- pymt
- cca-tools
- csdms-child
- csdms-sedflux-3d
- csdms-hydrotrend
- csdms-permamodel-ku
- csdms-permamodel-frostnumber
- csdms-permamodel-kugeo
- csdms-permamodel-frostnumbergeo
- csdms-cruaktemp
- csdms-brake
- csdms-pydeltarcm
Alternately, use the requirements file listed here.
Optionally install the `babelizer`, in case a component needs to be built from source.
conda install -c csdms-stack babelizer
Optionally install IPython for testing.
conda install ipython
Recall that when running IPython remotely, it's helpful to set
export MPLBACKEND=Agg
HDF5 and file locks
When testing the executor, I found that it couldn't write output to NetCDF4 files, with this scary-looking message was written to stdout:
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 47342242057600: #000: H5F.c line 491 in H5Fcreate(): unable to create file major: File accessibilty minor: Unable to open file #001: H5Fint.c line 1305 in H5F_open(): unable to lock the file major: File accessibilty minor: Unable to open file #002: H5FD.c line 1839 in H5FD_lock(): driver lock request failed major: Virtual File Layer minor: Can't update object #003: H5FDsec2.c line 940 in H5FD_sec2_lock(): unable to lock file, errno = 37, error message = 'No locks available' major: File accessibilty minor: Bad file ID accessed
On further inspection, I found that I could import the `netCDF4` Python package, but calling `Dataset` threw an exception.
When googling `H5F_open(): unable to lock the file`, I found a offhand reference (here) by an HDF5 developer to issues with file locks on Lustre filesystems in the newly released HDF5 version 1.10. Interestingly, the report came from a janus user.
I thought that by rolling back the HDF5 version to 1.8.x, I may be able to work around this issue. However, `esmpy` depends on HDF5, so I couldn't do it directly. I found that rolling back the `netcdf-fortran` package by one build did the trick:
$ conda install netcdf-fortran=4.4.4=5 -c defaults -c conda-forge <snip> The following packages will be DOWNGRADED: esmf: 7.0.0-9 conda-forge --> 7.0.0-8 conda-forge hdf5: 1.10.1-h9caa474_1 --> 1.8.18-h6792536_1 netcdf-fortran: 4.4.4-6 conda-forge --> 4.4.4-5 conda-forge
WMT can now write output to NetCDF4 on blanca.
Still true. --Mpiper (talk) 10:06, 7 September 2018 (MDT)
Install executor software
Load blanca's `git` module.
module load git
Install the `wmt-exe` package from source.
mkdir -p $install_dir/opt && cd $install_dir/opt git clone https://github.com/csdms/wmt-exe cd wmt-exe python setup.py develop
Create a site configuration file that describes the executor and symlink it to the executor's etc/ diectory.
work_dir="/rc_scratch/$USER/wmt/_testing" python setup.py configure --wmt-prefix=$install_dir --launch-dir=$work_dir --exec-dir=$work_dir #ln -s "$(realpath wmt.cfg)" $install_dir/conda/etc # "realpath" not installed on blanca :( cd $install_dir/conda/etc ln -s $install_dir/opt/wmt-exe/wmt.cfg
Check that `$USER` didn't get expanded in the file. Lines 10-11 should be:
exec_dir = /rc_scratch/$USER/wmt/_testing launch_dir = /rc_scratch/$USER/wmt/_testing
Note that we're using /rc_scratch for the launch and execution directories instead of the default ~/.wmt. Also note that we needed an SbatchLauncher class for wmt-exe because blanca uses Slurm instead of Torque for job control.
Install and test CSDMS components
Each section below describes how to install and test a particular CSDMS component.
Currently installed components:
- BRaKE
- Child
- CRUAKTemp
- FrostNumberGeoModel
- FrostNumberModel
- Hydrotrend
- KuGeoModel
- KuModel
- PyDeltaRCM
- Sedflux3D
