BMI: Difference between revisions

From CSDMS
m (Reformatted page.)
(Added a summary of BMI.)
Line 2: Line 2:
= The CSDMS Basic Model Interface (version 1.0) =
= The CSDMS Basic Model Interface (version 1.0) =


In order to simplify conversion of an existing model to a reusable, plug-and-play model component, CSDMS has developed a simple interface called the '''''Basic Model Interface''''' or '''BMI''' that model developers are asked to implement. Recall that in this context an '''''interface''''' is a named set of functions with prescribed function names, argument types and return types.  The BMI functions make the model '''''self-describing''''' and fully '''''controllable''''' by a modeling framework.
Development of scientific modeling software increasingly requires
the coupling of multiple, independently developed models.
Component-based software engineering enables the integration of
plug-and-play components, but significant additional challenges
must be addressed in any specific domain in order to produce a
usable development and simulation environment that also encourages
contributions and adoption by entire communities. In this paper we
describe the challenges in creating a coupling environment for
Earth-surface process modeling and the innovative approach that
we have developed to address them within the Community Surface
Dynamics Modeling System.


By design, the BMI functions are straightforward to implement in any of the languages supported by CSDMS, which include C, C++, Fortran (all years), Java and Python.  Even though some of these languages are object-oriented and support user-defined types, the BMI functions use only simple (universal) data types.
== Links ==


Also by design, the BMI functions are '''''noninvasive'''''.  A BMI-compliant model does not make any calls to CSDMS components or tools and is not modified to use CSDMS data structures. BMI therefore introduces no dependencies into a model and the model can still be used in a "stand-alone" manner.
* A description of the [http://bmi-spec.readthedocs.io/en/latest/ BMI specification]: Go here for a detailed description of the latest version of BMI.
 
* BMI on [https://github.com/csdms/bmi GitHub]: Go here to contribute to BMI, ask a BMI-related question, or submit an issue.
Any model that provides the BMI functions can be easily converted to a CSDMS plug-and-play component that has a CSDMS [[CMI_Description | '''Component Model Interface''']] or '''CMI'''.  This conversion/wrapping process is done by CSDMS staff, but BMI-enabled models basically just "drop in" to the system.  The BMI functions are called by the CMI, by the framework and by service components.  It is not necessary for a developer to learn anything about the CMI unless they're just [[CMI_Description | '''''curious''''']].
* The original [http://www.sciencedirect.com/science/article/pii/S0098300412001252 BMI article in Computers & Geosciences]
 
Any model that provides the BMI functions should also be straightforward to ingest as a component into other component-based modeling frameworks.  For example, all model coupling frameworks use Model Control Functions very similar to those described below, so providing them helps get a model ready for '''''plug-and-play'''''.
 
Once a BMI-enabled model has been wrapped by CSDMS staff to become a CSDMS component, it automatically gains many '''''new capabilities'''''.  This includes the ability to be coupled to other models even if their (1) programming language, (2) variable names, (3) variable units, (4) time-stepping scheme or (5) computational grid is different.  It also gains (1) the ability to write output variables to standardized NetCDF files, (2) a "tabbed-dialog" graphical user interface (GUI), (3) a standardized HTML help page and (4) the ability to run within the [http://csdms.colorado.edu/wiki/WMT_information '''CSDMS Web Modeling Tool (WMT)'''].
 
Note that "long_var_name" (long variable name) appears as an argument to several of the BMI functions below.  This refers to a standardized variable name from the [[CSDMS_Standard_Names | '''CSDMS Standard Names''']].  Note that you '''do not''' change the variable names that you currently use within your model.  The standard names are too long to be used within your model code.  Instead, you find a matching CSDMS Standard Name for each variable in your model and then write your BMI functions to accept the standard names and map them to your model's internal names.  This provides a standard way for callers to ask about your model's internal variables.
 
The use of standard names makes it possible for the framework to '''''automatically''''' connect "user components" to "provider components" without user intervention.  The framework can also use metadata associated with the "long variable name" (stored in a Model Metadata File) to determine the degree to which a variable from a provider matches the needs of the user.
 
If the model developer provides a simple "GUI XML" file and a template/example of the model's ''configuration file'', then CSDMS tools can automatically build a tabbed-dialog GUI for the model that appears in the [http://csdms.colorado.edu/wiki/WMT_information '''WMT'''].  Examples of (1) a GUI XML file, (2) a standardized HTML help page and (3) a Model Metadata File will be provided here soon.
 
The CMI wrapping does not have a significant impact on performance.  This is due to the use of [https://computation.llnl.gov/casc/components/#page=home '''Babel'''] for language interoperability and the fact that CSDMS components pass values '''''by reference''''' instead of '''''by copy''''' whenever possible.
 
Additional information on the design of the CSDMS framework can be found in [http://www.sciencedirect.com/science/article/pii/S0098300412001252 '''Peckham et al. (2012)'''].
 
Examples of BMI bindings are provided for [https://github.com/csdms/bmi-c C], [https://github.com/csdms/bmi-cxx C++], [https://github.com/csdms/bmi-python Python], and [https://github.com/csdms/bmi-f90 Fortran].
 
The source code for a complete Python example that demonstrates how to wrap a component is available [[media:Water_Tank_Archive.zip | '''here''']]. This is the water tank example presented at the CSDMS annual meeting.
 
== Model Control Functions ==
 
<syntaxhighlight lang=java>
void initialize (in string config_file)
void update (in double dt) //  Advance model variables by time interval, dt (dt=-1 means use model time step)
void finalize ()
void run_model (in string config_file) //  Do a complete model run. Not needed for CMI.
</syntaxhighlight>
 
These BMI functions are critical to plug-and-play modeling because they allow a calling component to bypass a model's own time loop.  They also provide the caller with '''''fine-grained control''''' over the model, similar to a TV remote control.
 
The '''''initialize()''''' function accepts a string argument that gives the name (and path) of its "main input file", called a '''''configuration file'''''.  This function should perform all tasks that are to take place before entering the model's time loop.  Models should be refactored, if necessary, to read their inputs (which could include filenames for other input files) from a configuration file (a text file).  CSDMS does not impose any constraint on how configuration files are formatted, but a "template" of your model's config file (with placeholder values) is used when the CSDMS-provided GUI creates a config file for your model.
 
The '''''update()''''' function accepts a time step argument, "dt".  If (dt == -1), then the model should use its own (internal) timestep;  otherwise it should use the value provided.  This function should perform all tasks that take place during one pass through the model's time loop.  It does not contain the time loop. This typically includes incrementing all of the model's state variables.  If the model's state variables don't change in time, then they can be computed by the initialize() function and this function can just return without doing anything.
 
The '''''finalize()''''' function should perform all tasks that take place after exiting the model's time loop.  This typically includes deallocating memory, closing files and printing reports.
 
The '''''run_model()''''' function is not needed by CSDMS but provides a simple method to run the model in "stand-alone mode". (It is often used by the developer; it is basically the model's "main".)  It would simply call "initialize()", start a time loop that only calls "update()" and then calls "finalize()".
 
== Model Information Functions ==
 
<syntaxhighlight lang=sidl>
array<string> get_input_var_names()
array<string> get_output_var_names()
string get_attribute( in string att_name ) // (for model_name, grid_type, time_step_type, etc.)
</syntaxhighlight>
 
These BMI functions are called by the CSDMS framework in order to determine what input variables each model component needs and what output variables it can provide to other components.
 
* The '''''get_input_var_names()''''' function returns a string array of the model's ''input variable'' names as "long variable names" from the [[CSDMS_Standard_Names | '''CSDMS Standard Names''']]. See the notes at the top of this page.
:
The '''''get_output_var_names()''''' function returns a string array of the model's ''output variable'' names from the [[CSDMS_Standard_Names | '''CSDMS Standard Names''']]. See the notes at the top of this page.
 
The '''''get_attribute()''''' function returns a '''''static attribute''''' (i.e. an attribute that does not change from one model application to the next) of the model (as a string) when passed any attribute name from the following list:
* model_name
* version      (e.g. 2.0.1)
* author_name
* grid_type
* time_step_type
* step_method  (explicit, implicit, semi_implicit, iterative)
 
For the "grid_type" attribute (see ''Grid Information Functions'' below), the allowed return values are:
* uniform_grid
* rectilinear_grid
* structured_grid
* unstructured_grid
* none
 
For the "time_step_type" attribute, the allowed return values are:
* fixed      (Timestep size is fixed for all time and is used by all grid cells.)
* adaptive    (Timestep varies in time, but is used by all grid cells.)
* des        (Timestep size varies in both space and time.  See below.)
* none        (State variables do not vary in time.)
 
Note that DES ([http://en.wikipedia.org/wiki/Discrete_event_simulation Discrete Event Simulation]) models allow each grid cell to have its own, adaptive time step.
 
The "grid_type" attribute is used by the framework to automatically perform spatial regridding when coupled models use different grids.
 
The "time_step_type" attribute and BMI functions like "get_time_step()" below are used by the framework to automatically accommodate time step differences between coupled models.
 
For time-stepping models ("time_step_type" other than "none"), the "step_method" attribute is used to distinguish between "explicit" and "implicit" numerical solution schemes.  Some "models" &mdash; like root finders and "successive over relaxation" (SOR) solvers &mdash; involve iterations as opposed to "time steps".  They would return a "time_step_type" attribute of "none" and a "step_method" attribute of "iterative".  Note that their "update()" function still gives the caller fine-grained control.
 
== Variable Information Functions ==
 
<syntaxhighlight lang=sidl>
string get_var_type( in string long_var_name ) // ( returns type_string, e.g. ‘double’)
string get_var_units( in string long_var_name ) // ( returns unit_string, e.g. ‘meters’ )
int get_var_rank( in string long_var_name ) // ( returns array rank or 0 for scalar)
string get_var_name( in string long_var_name ) // ( returns model’s internal, short name )
double get_time_step() // (returns the model’s current timestep;  adaptive or fixed.)
string get_time_units() // (returns unit string for model time, e.g. ‘seconds’, ‘years’)
double get_start_time()
double get_current_time()
double get_end_time()
</syntaxhighlight>
 
These BMI functions are called by the CSDMS framework to obtain information about a particular input or output variable.  They must accommodate any variable that is returned by the BMI functions ''get_input_var_names()'' or ''get_output_var_names()''. Based on this information, the framework can apply type or unit conversion when necessary.
 
Note that "long_var_name" refers to a standardized variable name from the [[CSDMS_Standard_Names | '''CSDMS Standard Names''']]. See the notes at the top.
 
The '''''get_var_name()''''' function is not used by CSDMS but often makes it easier to implement the BMI functions that have a "long_var_name" argument.
 
For the '''''get_var_units()''''' and '''''get_time_units()''''' functions, standard unit names (in lower case) should be provided, such as "meters" or "feet".  Standard abbreviations, like "m" for "meters" and "mi" for "miles" are also supported.  For variables with "compound units", each primitive unit name or abbreviation is separated by a single space character and exponents other than 1 are placed immediately after the name, as in "m s-1" for velocity, or "W m-2" for an energy flux, or "km2" for an area.  CSDMS uses the [http://www.unidata.ucar.edu/software/udunits/ '''UDUNITS'''] standard from Unidata.
 
For the '''''get_var_type()''''' function, the returned data type should be a string from the first or last column of the following table.
 
{|
! align=left width=250 | &nbsp; BMI datatype
! align=left width=150 | C datatype
! align=left width=150 | NumPy datatype
|-
| &nbsp; BMI_CHAR
| <tt>char</tt>
| int8
|-
| &nbsp; BMI_UNSIGNED_CHAR
| <tt>unsigned char</tt>
| uint8
|-
| &nbsp; BMI_INT
| <tt>signed int</tt>
| int16
|-
| &nbsp; BMI_LONG
| <tt>signed long int</tt>
| int32
|-
| &nbsp; BMI_UNSIGNED_INT
| <tt>unsigned int</tt>
| uint16
|-
| &nbsp; BMI_UNSIGNED_LONG
| <tt>unsigned long int</tt>
| uint32
|-
| &nbsp; BMI_FLOAT
| <tt>float</tt>
| float32
|-
| &nbsp; BMI_DOUBLE
| <tt>double</tt>
| float64
|-
|}
 
== Variable Getter and Setter Functions ==
 
<syntaxhighlight lang=sidl>
double get_0d_double( in string long_var_name )
array<double> get_1d_double( in string long_var_name  )
array<double,2> get_2d_double( in string long_var_name )
array<double> get_2d_double_at_indices( in string long_var_name, in array<int> indices )
 
void set_0d_double( in string long_var_name, in double scalar )
void set_1d_double( in string long_var_name, in array<double> array)
void set_2d_double( in string long_var_name, in array<double,2> array)
void set_2d_double_at_indices( in string long_var_name, in array<int> indices, in array<double> array )
</syntaxhighlight>
 
A '''''getter''''' is a function that your model provides that other entities can call in order to get a variable from your model's "state" (often as a reference).  Often a model's state variables are changing with each time step, so getters are called to get current values.
 
A '''''setter''''' is a function that your model provides that other entities can call in order to change/overwrite a variable in your model's state. A setter may impose restrictions on how a state variable can be changed or check the new data for validity.
 
There are different getter and setter functions for scalars (0d), 1D arrays (1d), 2D arrays (2d) and 3D arrays (3d).  This simplifies implementation, since most of the programming languages supported by CSDMS require '''''static''''' vs. '''''dynamic''''' data types.  (''However, other approaches are possible and may also be supported later.'')
 
Although not listed above, BMI functions to get and set integer data are also supported.  They have names like: "get_2d_int()" instead of "get_2d_double()".
 
There is no problem if a model uses arrays with a dimension greater than 3.  In that case, BMI functions with names like "get_4d_double()" would simply be provided, following the same naming pattern.
 
The BMI functions '''''get_2d_double_at_indices()''''' and '''''set_2d_double_at_indices()''''' allow a (possibly much smaller) subset of values to be obtained from (or set into) an array.  This can dramatically reduce the amount of data that is passed, which can be important when components are coupled across a network. 
 
Note that "long_var_name" refers to a standardized variable name from the [[CSDMS_Standard_Names | '''CSDMS Standard Names''']]. See the notes at the top.
 
== Grid Information Functions ==
 
Different models often use different computational grids.  CSDMS needs a complete and standardized description of a model's grid in order to automatically accommodate differences between model grids (via regridding) when coupled models share data.  There are four supported "grid types" (which cover all possibilities) and a '''''different set of BMI functions''''' is required for each. The BMI functions for each type of grid are listed in four separate sections below.
 
The BMI function call '''''get_attribute( "grid_type" )''''' should return one of the following strings, which correspond to the four supported "grid types".
 
uniform_grid      (for uniform rectilinear)
rectilinear_grid
structured_grid
unstructured_grid
 
CSDMS uses the term "grid" to encompass any type of "computational grid".  See: [http://en.wikipedia.org/wiki/Regular_grid '''Regular grid'''] and [http://en.wikipedia.org/wiki/Unstructured_grid '''Unstructured grid'''].
 
If a grid corresponds to a region on the surface of the Earth or some other planet, then it needs to be [http://en.wikipedia.org/wiki/Georeference '''georeferenced'''].  This generally means specifying a [http://en.wikipedia.org/wiki/Reference_ellipsoid '''reference ellipsoid'''], [http://en.wikipedia.org/wiki/Datum_(geodesy) '''datum'''] and possibly a [http://en.wikipedia.org/wiki/Map_projection '''map projection'''] in the Model Metadata File.  See:  [[CSN_Metadata_Names | '''CSDMS Metadata Names''']].
 
The BMI functions below return grid descriptions that are compatible with [http://earthsystemcog.org/projects/esmp/ '''ESMP'''], the new Python interface for the ESMF regridding tool.  CSDMS uses this tool for spatial regridding, when needed.
 
The BMI functions below are also closely aligned with the grid types supported by [http://www.vtk.org/ '''VTK'''], as described in the [http://www.vtk.org/VTK/img/file-formats.pdf '''VTK File Formats'''] document.
 
The [https://groups.google.com/group/ugrid-interoperability '''Ugrid Interoperability Group'''] is working on a standard method for describing and storing unstructured grids.  It is expected to be compatible with NetCDF files.  Both Deltares and ESMF are involved in this effort.
 
Note that "uniform rectilinear", "rectilinear" and "structured grid" all have the topology of a 2D array (or perhaps 3D).  These 3 grid types each have a "get_grid_shape()" function that returns the dimensions of this 2D array.
 
=== Uniform Rectilinear ===
 
[[Image:mesh_uniform_rectilinear.png|300px|wrap]]
<syntaxhighlight lang=sidl>
array<double, 1> get_grid_spacing (in string long_var_name)
array<double, 1> get_grid_lower_left_corner (in string long_var_name)
array<int, 1> get_grid_shape (in string long_var_name)
</syntaxhighlight>
 
Only 6 numbers are needed to define a 2D uniform grid, namely the number of rows and columns in the two coordinate directions, the grid spacing in those two directions and the 2 coordinates of the lower-left corner with respect to some reference coordinate system.  Similarly, only 9 numbers are needed to define a 3D uniform grid.
 
In a 2D uniform grid, every grid cell (or element) is a rectangle and all cells have the same dimensions.  If the dimensions are equal, then the grid is a [http://en.wikipedia.org/wiki/Square_tiling '''tiling of squares'''].
 
Each of these functions returns information about each dimension of a grid. The dimensions are ordered with "ij" indexing (as opposed to "xy"). For example, the get_grid_shape() function for the above grid would return the array [4, 5]. If there were a third dimension, the length of the z dimension would be listed first.  This same convention is used in NumPy.  Note that the grid shape is the number of '''nodes''' in the coordinate directions and not the number of cells or elements.  It is possible for grid values to be associated with the nodes or with the cells.
 
=== Rectilinear ===
[[Image:mesh_rectilinear.png|300px|wrap]]
<syntaxhighlight lang=sidl>
array<double, 1> get_grid_x (in string long_var_name)
array<double, 1> get_grid_y (in string long_var_name)
array<double, 1> get_grid_z (in string long_var_name)
array<int, 1> get_grid_shape (in string long_var_name)
</syntaxhighlight>
 
In a 2D rectilinear grid, every grid cell (or element) is a rectangle but different cells can have different dimensions.  All cells in the same row have the same grid spacing in the y direction and all cells in the same column have the same grid spacing in the x direction.  Grid spacings can be computed as the difference of successive x or y values.
 
Some river plume models use a logarithmic rectilinear grid where the grid spacing increases with distance from the river mouth.
 
Some infiltration and groundwater models use a rectilinear grid where the grid spacing increases with distance below the land surface.
 
=== Structured Grid ===
[[Image:mesh_structured.png|300px|wrap]]
<syntaxhighlight lang=sidl>
array<double, 1> get_grid_x (in string long_var_name)
array<double, 1> get_grid_y (in string long_var_name)
array<int, 1> get_grid_shape (in string long_var_name)
</syntaxhighlight>
 
The cells in this type of grid are always 4-sided polygons, or "quadrilaterals".
 
These grids are sometimes called '''''logically rectangular''''' since they have the topology of a 2D or 3D array.
 
Note that [http://en.wikipedia.org/wiki/Curvilinear_coordinates '''curvilinear grids'''] (including [http://en.wikipedia.org/wiki/Orthogonal_coordinates '''orthogonal curvilinear grids''']) are a special case of this grid type.
 
=== Unstructured Grid ===
[[Image:mesh_unstructured.png|300px|wrap]]
<syntaxhighlight lang=sidl>
array<double, 1> get_grid_x (in string long_var_name)
array<double, 1> get_grid_y (in string long_var_name)
array<int, 1> get_grid_connectivity (in string long_var_name)
array<int, 1> get_grid_offset (in string long_var_name)
</syntaxhighlight>
 
This is the most general grid type and can be used for any type of grid.  However, most grids that consist of 4-sided polygons can be represented using one of the other grid types.  This grid type must be used if the grid consists of any elements or '''''cells''''' which do not have four sides.  This includes any grid of triangles (e.g. [http://en.wikipedia.org/wiki/Delaunay_triangulation '''Delaunay triangles''']) and a [http://en.wikipedia.org/wiki/Voronoi_tessellation '''Voronoi tessellation'''].
 
Note that a grid of [http://en.wikipedia.org/wiki/Triangle_tiling '''equilateral triangles'''], while in some sense "structured", would need to be represented as an unstructured grid.  The same is true for a grid of [http://en.wikipedia.org/wiki/Hexagonal_tiling '''hexagons'''].

Revision as of 17:41, 23 January 2017

The CSDMS Basic Model Interface (version 1.0)

Development of scientific modeling software increasingly requires the coupling of multiple, independently developed models. Component-based software engineering enables the integration of plug-and-play components, but significant additional challenges must be addressed in any specific domain in order to produce a usable development and simulation environment that also encourages contributions and adoption by entire communities. In this paper we describe the challenges in creating a coupling environment for Earth-surface process modeling and the innovative approach that we have developed to address them within the Community Surface Dynamics Modeling System.

Links