Help:CSDMS HPCC: Difference between revisions

From CSDMS
m added some extra info about request an account
 
(44 intermediate revisions by 3 users not shown)
Line 1: Line 1:
=== The CSDMS High Performance Computing Cluster ===
= The CSDMS High Performance Computing Cluster (Code name: Blanca) =
__TOC__


The CSDMS High Performance Computing Cluster (HPCC) provides CSDMS researchers a state-of-the-art HPC cluster.


==== Hardware ====
Use of the CSDMS HPCC is available free of charge to the CSDMS community!  Follow [[HPCC_account_requirements|these guidelines]] to request a one year guest account on our machine.
[[Image:Altix XE.jpg | 100px | right ]]
Our '''CSDMS HPCC System''' is an SGI Altix XE 1300 with integrated 512 x 3.0GHz/12M/1600MHz/80W E5472 processors, using non-blocking Infiniband Interconnect with 1.152TB of memory, with one head node, 28 compute nodes, 4 compute nodes with heavy memory, associated infrastructure, 72TB/7200RPM/SATA Raid storage, web server 4 x 2.33GHz/8GB RAM E5420 processor.<br>
The CSDMS-HPCC (≈ 6Tflops) is configured with two HPC approaches:
# massive shared memory among fewer processors,
# the more typical parallel configuration each running Linux Red Hat with Fortran, C and C++ compilers.
This system offers to potential CSDMS researchers a state of the art HPC, once their code can be scaled up to take advantage of the capaboilities of these systems.


The CSDMS system will be tied in to the larger, overarching CU-wide  '''Front Range Computing Consortium'''.  This supercomputer will consist of 10 Sun Blade 6048 Modular System racks, nine deployed to form a tightly integrated computational plant, and the remaining rack to serve as a GPU-based accelerated computing system. Each computational rack is composed of four integrated Sun Blade 6000 series chassis units, containing 12 blades connected by an internal quad-data-rate InfiniBand fabric; a 24-port Network Expansion Module (NEM) provides external connectivity. Each blade is composed of two dual-socket boards with each socket containing a quad-core Intel Nehalem-EP processor clocked at an expected frequency of ≈ 3.3 GHz. Each dual-socket board has eight 2 GB DDR-3 DIMMS for a total of 16 GB of RAM (2 GB per core). Each rack contains 192 processors (768 cores), for a total of 7680 cores (including the accelerated computing rack), with a peak performance exceeding 10 Teraflops/s for aggregate system peak performance of 101.4 Teraflop/s. The storage solution will consist of a high-performance Lustre file system built on 12 to 16 Sun Thumper- 2 storage server chassis. Each of the Thumper-2 units has 48 internal disks, an 8x PCI-Express IB Host Bus Adapter (HBA), and utilizes the Sun ZFS block allocation layer to provide in excess of 800 Megabytes per second per Thumper.<br>
== Attribution and Reporting of Results ==
Using 1 Terabyte disks, the total raw capacity is between 576 and 768 Terabytes. The remaining computational rack provides the accelerated computing component using NVIDIA Tesla 870 GPU technology. The Tesla 870 GPU system is a 1U chassis containing four 128-simultaneous thread GPUs and 6 Gigabytes of RAM accessible at 76.8 Gigabytes per second. The entire system will utilize a standard Linux-based software stack, vendor-supplied IB-based MPI, and the Coordinated TerraGrid Software and Services. In addition, the Grid environment will provide access to NCAR’s mass storage system.
When reporting results which were obtained on the CSDMS cluster, we request that the following language be used as an acknowledgement:
 
"We acknowledge computing time on the CU-CSDMS High-Performance Computing Cluster."
 
Also, please notify us of any tech reports, conference papers, journal articles, theses, or dissertations which contain results which were obtained on blanca. Your assistance will help to ensure that our online bibliography of results is as complete as possible. Citations should be sent to [mailto:CSDMSsupport@colorado.edu us].
 
== Hardware ==
[[File:sgi_logo_hires.jpg | right | 250px ]]
 
The CSDMS High Performance Computing Cluster is an [http://www.sgi.com SGI] [http://www.sgi.com/products/servers/altix/xe Altix XE] 1300 that consists of 64 Altix XE320 compute nodes (for a total of 512 cores).  The compute nodes are configured with two quad-core 3.0GHz E5472 (Harpertown) processors.  54 of the 64 nodes have 2 GB of memory per core, while the remaining nodes have 4 GB of memory per core.  The cluster is controlled through an Altix XE250 head node.  Internode communication is accomplished through either gigabit ethernet or over a non-blocking [http://en.wikipedia.org/wiki/InfiniBand InfiniBand] fabric.
 
Each compute node has 250 GB of local temporary storage.  However, all nodes are able to access 36TB of RAID storage through NFS.
 
The CSDMS system will be tied in to the larger 7000 core (>100 Tflop) '''Front Range Computing Consortium'''.  This supercomputer will consist of 10 Sun Blade 6048 Modular System racks, nine deployed to form a tightly integrated computational plant, and the remaining rack to serve as a GPU-based accelerated computing system.In addition, the Grid environment will provide access to NCAR’s mass storage system.
4.10
4.10


Some benchmarks that we've run on blanca:
* The OSU [[ CSDMS_HPCC_OMB_benchmarks |micro-benchmarks]]
=== Hardware Summary ===
{|
! align=left width=150 | Node
! align=left width=150 | Type
! align=left width=200 | Processors
! align=left width=100 | Memory
! align=left width=150 | Internal Storage
|-
| blanca.colorado.edu
| Head (Altix XE250)
| 2 Quad-Core Xeon<ref name=proc_specs>
Processors are Quad-core Intel Xeon E5472 (Harpertown):
* Front Side Bus: 1600 MHz
* L2 Cache: 12MB
</ref>
| 16GB<ref name=mem_specs>
Memory is DDR2 800 MHz FBDIMM</ref>
| --
|-
| cl1n001 - cl1n056
| Compute (Altix XE320)
| 2 Quad-Core Xeon <ref name=proc_specs/>
| 16GB <ref name=mem_specs/>
| 250GB SATA
|-
| cl1n057 - cl1n064
| Compute (Altix XE320)
| 2 Quad-Core Xeon <ref name=proc_specs/>
| 32GB <ref name=mem_specs/>
| 250GB SATA
|}
<references />


==== Software ====
== Software ==
[[Image:HPCC.png | 350px | right | The CSDMS HPCC]]
'''Compute Nodes:'''
: Free stuff:
# [http://www.pythonware.com/products/pil Python Imaging Library (PIL)]
# [http://www.unidata.ucar.edu/software/udunits Udunits]
# [http://www.unidata.ucar.edu/software/netcdf netcdf]
# [http://www.openmotif.org openmotif]
# [http://www.eclipse.org) Eclipse]
# [http://subclipse.tigris.org Subclipse]
# [http://www.astro.caltech.edu/~tjp/pgplot Pgplot]
# [http://www.hdfgroup.org/HDF5/release/index.html HDF5]
# [http://www.mathworks.com Matlab]
# [http://eucalyptus.cs.ucsb.edu/ Eucalyptus]
# Languages:
## Python 2.6, 3000 (along with numpy and scipy where possible)
## Java 1.6
## GNU compilers > 4.0 (gcc, g++, gfortran)


==== Request an account ====
Below is a list of some of the software that we have installed on blanca.  If there is a particular software package that is not listed below and would like to use it, please feel free to send an email to [mailto:CSDMSsupport@colorado.edu us] outlining what it is you need.
The CSDMS HPCC is free available for all CSDMS members. To get an account on the HPCC you have to:
# Be a [[Organization#Join|member]] one of the Working (WG) or Focus Research Groups (FRG),
# Provide the community beforehand with [[Models questionnaire|metadata of the model]] you want to run on the HPCC,
# Make the source code free available to the community, either by submitting your code to the CSDMS repository [mailto:csdms@colorado.edu csdms@colorado.edu] or by submitting it to one of the free accessible online repositories ([http://sourceforge.net/ sourceforge], [http://code.google.com/opensource/ googlecode], etc).  


Once you met the above requirements you can request an [[HPCC account request | CSDMS HPCC account]]
=== Compilers ===
{|
Your HPCC guest account will be valid for ''one year''. You will receive an email as soon as your account expires. Your data (model, source code, simulations, etc) will be removed from the HPCC if you don't extend your account (by email to [mailto:csdms@colorado.edu CSDMS@colorado.edu]). Unfortunately, we have to charge a fee if data needs to be recovered after an account expires.
! align=left width=100 |  Name
! align=left width=100 |  Version
! align=left width=100 | Module Name
! align=left | Location
|-
| [http://gcc.gnu.org/ gcc]
| 4.1
| gcc/4.1
| /usr
|-
| [http://gcc.gnu.org/ gcc]
| 4.3
| gcc/4.3
| /usr/local/gcc
|-
| [http://gcc.gnu.org/wiki/GFortran gfortran]
| 4.1
| gcc/4.1
| /usr
|-
| [http://gcc.gnu.org/wiki/GFortran gfortran]
| 4.3
| gcc/4.3
| /usr/local/gcc
|-
| icc
| 11.0
| intel
| /usr/local/intel
|-
| ifort
| 11.0
| intel
| /usr/local/intel
|-
| [http://www.mcs.anl.gov/research/projects/mpich2/ mpich2]
| 1.1
| mpich2/1.1
| /usr/local/mpich
|-
| [http://mvapich.cse.ohio-state.edu/ mvapich2]
| 1.2
| mvaich2/1.2
| /usr/local/mvapich
|-
| [http://www.open-mpi.org/ openmpi]
| 1.3
| openmpi/1.3
| /usr/local/openmpi
|}


==== Access ====
=== Languages ===
Once you have an account you can access the CSDMS HPCC with any secure-shell (SSH) application (primarily ssh, scp, sftp) from workstations located in the CU Internet domain (*.colorado.edu) or from workstations connected to the colorado.edu domain through a virtual private network (VPN) connection. A VPN account will automatically be created for users outside the colorado.edu domain.
{|
! align=left width=100 | Name
! align=left width=100 | Version
! align=left width=100 | Module Name
! align=left | Location
|-
| Python<ref>
Python 2.4 modules:
* [http://numpy.scipy.org/ numpy] 1.2.1
* [http://www.scipy.org/ scipy] 0.6.0
* [http://www.pythonware.com/products/pil Python Imaging Library (PIL)]
</ref>
| 2.4
| python/2.4
| /usr
|-
| Python<ref>
Python 2.6 modules:
* [http://numpy.scipy.org/ numpy] 1.3.0
* [http://www.scipy.org/ scipy] 0.7.1rc3
* [http://www.pyngl.ucar.edu/Nio.shtml PyNIO] 1.3.0b1
* [https://ipython.org/ iPython] 0.10
* [http://www.cython.org Cython] 0.11.3
</ref>
| 2.6
| python/2.6
| /usr/local/python
|-
| Java
| 1.5
| --
| --
|-
| Java
| 1.6
| --
| --
|-
| perl
| 5.8.8
| --
| /usr
|-
| [http://www.mathworks.com/ MATLAB]
| 2008b
| matlab
| /usr/local/matlab
|}


Displaying of the graphical desktop of the HPCC master-control node to your personal workstation is possible through SSH Tunneling X Windows software. This might require prior installation and configuration of software on your workstation. See information below on how to operate the graphical desktop for [[#SSH Tunneling X Windows for Mac OSX | Mac]] and for [[#SSH Tunneling X Windows for Windows|Windows]] operating systems.
<references/>


===== SSH Tunneling X Windows for Mac OSX =====
=== Libraries ===
You will need X11 to tunnel X Windows to a Mac. Fortunately, Mac OSX comes with X11. If you're using an older version of OSX, [http://http://www.apple.com/downloads/macosx/apple/macosx_updates/x11formacosx.html download X11 from the apple site].<br>
{|
Open X11, select '''Applications''' and then '''Terminal'''.
! align=left width=100 |  Name
In the terminal type:
! align=left width=100 |  Version
ssh -Y beach.colorado.edu -l <your_username>
! align=left width=100 | Module Name
Type <password> and that's it. Now you can test the Tunneling by for example typing <matlab>.
! align=left | Location
|-
| [http://www.unidata.ucar.edu/software/udunits Udunits]
| 1.12.9
| udunits
| /usr/local/udunits
|-
| [http://www.unidata.ucar.edu/software/netcdf netcdf]
| 4.0.1
| netcdf
| /usr/local/netcdf
|-
| [http://www.hdfgroup.org/HDF5 hdf5]
| 1.8
| hdf5
| /usr/local/hdf5
|-
| [http://xmlsoft.org/index.html libxml2]
| 2.7.3
| libxml2
| /data/progs/lib/libxml2
|-
| [http://www.gtk.org/ glib-2.0]
| 2.18.3
| glib2
| /usr/local/glib
|-
| petsc
| 3.0.0p3
| petsc
| /usr/local/petsc
|-
| [http://www.mcs.anl.gov/research/projects/mct/ mct]
| 2.6.0
| mct
| /data/progs/mct/2.6.0-mpich2-intel
|}


===== SSH Tunneling X Windows for Windows =====
=== Tools ===
Install [http://www.straightrunning.com/XmingNotes/ Xming] on your windows machine.  
{|
''Needs more info''
! align=left width=100 |  Name
! align=left width=100 |  Version
! align=left width=100 | Module Name
! align=left | Location
|-
| [http://www.cmake.org/ cmake]
| 2.6p2
| cmake
| /usr/local/cmake
|-
| [http://www.scons.org/ scons]
| 1.2.0
| scons
| /usr/local/scons
|-
| [http://subversion.tigris.org/ subversion]
| 1.6.2
| subversion
| /usr/local/subversion
|-
| [http://www.clusterresources.com/torquedocs21/ torque]
| 2.3.5
| torque
| /opt/torque
|-
| [http://modules.sourceforge.net/ Environment modules]
| 3.2.6
| --
| /usr/local/modules
|}

Latest revision as of 13:55, 15 October 2019

The CSDMS High Performance Computing Cluster (Code name: Blanca)

The CSDMS High Performance Computing Cluster (HPCC) provides CSDMS researchers a state-of-the-art HPC cluster.

Use of the CSDMS HPCC is available free of charge to the CSDMS community! Follow these guidelines to request a one year guest account on our machine.

Attribution and Reporting of Results

When reporting results which were obtained on the CSDMS cluster, we request that the following language be used as an acknowledgement:

"We acknowledge computing time on the CU-CSDMS High-Performance Computing Cluster."

Also, please notify us of any tech reports, conference papers, journal articles, theses, or dissertations which contain results which were obtained on blanca. Your assistance will help to ensure that our online bibliography of results is as complete as possible. Citations should be sent to us.

Hardware

The CSDMS High Performance Computing Cluster is an SGI Altix XE 1300 that consists of 64 Altix XE320 compute nodes (for a total of 512 cores). The compute nodes are configured with two quad-core 3.0GHz E5472 (Harpertown) processors. 54 of the 64 nodes have 2 GB of memory per core, while the remaining nodes have 4 GB of memory per core. The cluster is controlled through an Altix XE250 head node. Internode communication is accomplished through either gigabit ethernet or over a non-blocking InfiniBand fabric.

Each compute node has 250 GB of local temporary storage. However, all nodes are able to access 36TB of RAID storage through NFS.

The CSDMS system will be tied in to the larger 7000 core (>100 Tflop) Front Range Computing Consortium. This supercomputer will consist of 10 Sun Blade 6048 Modular System racks, nine deployed to form a tightly integrated computational plant, and the remaining rack to serve as a GPU-based accelerated computing system.In addition, the Grid environment will provide access to NCAR’s mass storage system. 4.10

Some benchmarks that we've run on blanca:

Hardware Summary

Node Type Processors Memory Internal Storage
blanca.colorado.edu Head (Altix XE250) 2 Quad-Core Xeon[1] 16GB[2] --
cl1n001 - cl1n056 Compute (Altix XE320) 2 Quad-Core Xeon [1] 16GB [2] 250GB SATA
cl1n057 - cl1n064 Compute (Altix XE320) 2 Quad-Core Xeon [1] 32GB [2] 250GB SATA
  1. 1.0 1.1 1.2 Processors are Quad-core Intel Xeon E5472 (Harpertown):
    • Front Side Bus: 1600 MHz
    • L2 Cache: 12MB
  2. 2.0 2.1 2.2 Memory is DDR2 800 MHz FBDIMM

Software

The CSDMS HPCC
The CSDMS HPCC

Below is a list of some of the software that we have installed on blanca. If there is a particular software package that is not listed below and would like to use it, please feel free to send an email to us outlining what it is you need.

Compilers

Name Version Module Name Location
gcc 4.1 gcc/4.1 /usr
gcc 4.3 gcc/4.3 /usr/local/gcc
gfortran 4.1 gcc/4.1 /usr
gfortran 4.3 gcc/4.3 /usr/local/gcc
icc 11.0 intel /usr/local/intel
ifort 11.0 intel /usr/local/intel
mpich2 1.1 mpich2/1.1 /usr/local/mpich
mvapich2 1.2 mvaich2/1.2 /usr/local/mvapich
openmpi 1.3 openmpi/1.3 /usr/local/openmpi

Languages

Name Version Module Name Location
Python[1] 2.4 python/2.4 /usr
Python[2] 2.6 python/2.6 /usr/local/python
Java 1.5 -- --
Java 1.6 -- --
perl 5.8.8 -- /usr
MATLAB 2008b matlab /usr/local/matlab
  1. Python 2.4 modules:
  2. Python 2.6 modules:

Libraries

Name Version Module Name Location
Udunits 1.12.9 udunits /usr/local/udunits
netcdf 4.0.1 netcdf /usr/local/netcdf
hdf5 1.8 hdf5 /usr/local/hdf5
libxml2 2.7.3 libxml2 /data/progs/lib/libxml2
glib-2.0 2.18.3 glib2 /usr/local/glib
petsc 3.0.0p3 petsc /usr/local/petsc
mct 2.6.0 mct /data/progs/mct/2.6.0-mpich2-intel

Tools

Name Version Module Name Location
cmake 2.6p2 cmake /usr/local/cmake
scons 1.2.0 scons /usr/local/scons
subversion 1.6.2 subversion /usr/local/subversion
torque 2.3.5 torque /opt/torque
Environment modules 3.2.6 -- /usr/local/modules