How To IRF

Overview

We have taken the following steps to convert the gc2d glacier model to a CSDMS component:

Convert source code to one of the supported languages
Test converted code
Create Init, Run, and Finalize functions
More testing

See below for an example of what the python code that runs gc2d might look like.

Convert Source Code

To be CSDMS compliant, source code must be in one of the supported languages. Currently, the following languages are supported: fortran, c, c++, java, and python. Since the Kessler gc2d glacier model is written as a Matlab m-file, the source code was converted to python.

Why python?

The MATLAB source code could have been converted to any of the supported languages but we chose python with the Numpy/SciPy packages. The principal reasons were:

Similar syntax
Similar functionality
- Matrix operations
- Extended libraries (FFT, convolution, image processing, stats, etc.)

Because the gc2d code was clearly written and didn't use any advanced matlab functionality (SimuLink, for example) the initial conversion was fairly straightforward. To convert the several thousand lines of code of gc2d took only a couple of days. Keep in mind that this was done by someone unfamiliar with the model, and had not done this type of conversion before (although I was familiar with both languages).

Why not another language (like c, say)?

gc2d could certainly have been converted to another language like c. In fact, maybe that would have been better. I don't know. Some advantages are: c is more widely used and so more readable by the community the compiled code should run faster has fewer bells and whistles and so may more easily link with other languages Of course there would be some disadvantages. Functions for matrix operations would have to be written More specialized library functions would have to be written Depending on what the matlab source code looks like, these disadvantages could be minor annoyances or deal breakers.

Test Code Conversion

To test the conversion of the code to python, we would run the original code and write output variable grids to files for each time step. This was also done with the converted python code and the results compared. If the files were the same, the code passed! Unfortunately, things weren't quite so easy.

What constitutes "the same"?

To quantify differences in output, we used the maximum difference of corresponding grids. Certainly, if this value was zero then the grids were considered the same. What generally happened though was that initial differences were on the order of 10-14 but then increased to order 1 after hundreds of time steps. These errors never got out of control and were generally limited to just one or two grid nodes.

Small errors can be attributed to round off error, but it isn't clear that these small errors should compound to produce the larger observed errors.

Unit testing gc2d

To try to track down the source of these errors, we broke the gc2d code into small units. Each unit was a dozen or so lines that did a specific job. Each unit was then run with a range of input data. Input data included random grids as well as real input data. Conversion errors could be quickly isolated to particular units.

Although this proved to be useful, we were reluctant to do this. Our goal from the outset was to only convert the legacy code to python but otherwise only minimally rewrite it. We wanted to do this for two reasons: The new code now looks considerably different from the submitted code This is potentially a lot more work

Create CSDMS Interface

For starters, we assume that the CSDMS is similar to ESMF and OpenMI. Basically, this means that each component must have Initialize, Run, and Finalize functions. OpenMI calls the Run function GetValues. For gc2d, I've created the following member functions:

setup: Read an input file and initialize variables
run_for: Run gc2d for some amount of time
get_var: Get a grid of some variable
destroy: Free resources used for gc2d

These functions can now easily be placed into a CCA framework.

Example Code

We want the python code to look something like this: <geshi lang=Python>

   import component
   from unum.units import *

   # Create an instance of the component
   glacier_model = component.gc2d

   # Read input files, initialize vars, etc.
   glacier_model.setup ( 'Animas_200.mat' )

   # Run the model for some period of time
   glacier_model.run_for( 1.*YR )

   # Get modeled ice thickness in meters
   glacier_t = glacier_model.get_var( 'Thickness' , units=M )

   # Free resources
   glacier_model.destroy()

</geshi>

Example 2: sedflux

   gint
   csdms_sample_prog( void )
   { 
       CSDMSComp* c      = NULL;
       double*    z_vals = NULL;
       double*    d_vals = NULL;
       gint*      here   = NULL;
       double     now    = 100.;

       // Create a new CSDMS component
       c = csdms_comp_new ( );

       // Set IRF functions to make the general CSDMS component a sedflux component 
       csdms_comp_set_irf ( c , sed_init_func , sed_run_func , sed_finalize_func );

       // Initialize the model
       csdms_comp_init    ( c , "sedflux_init_file" );

       // Run the model until time=now and get elevation values at here
       csdms_comp_get_val ( c , "Elevation"  , here , now      , &z_vals );

       // Get grain size values again at time=now
       csdms_comp_get_val ( c , "Grain Size" , here , now      , &d_vals );

       // Advance the model from now to now+5 and get elevation values
       csdms_comp_get_val ( c , "Elevation"  , here , now + 5. , &z_vals );

       // Clean up the model
       csdms_comp_finalize( c );

       // Destroy the component
       c = csdms_comp_destroy( c );

       return EXIT_SUCCESS;
   }

</geshi>

The functions sed_init_func, sed_run_func, and sed_finalize_func are provided by the sedflux shared library. As such, when compiling the above sample code, libsedflux must be linked to. These functions are written by the sedflux developer.

Init function

In this example, the sedflux initialize function opens and initialization file ("sedflux_init_file") that provides sedflux with the necessary information to begin the model. This information includes: bathymetry data, sediment characteristics, river discharge, and process constants.

Run function

The run function advances the sedflux state from it's current time to a future time and calculates specified values at that time and place.