Info Contrib Requirements

From CSDMS

Requirements for Code Contributors

The CSDMS project is happy to accept open-source code contributions from the modeling community in any programming language and in whatever form it happens to be in. One of our key goals is to create an inventory of what models are available. We use an online questionnaire to collect basic information about different open-source models and we make this information available to anyone who visits our website at csdms.colorado.edu. We can also serve as a repository for model source code, but in many cases our website instead redirects visitors to another website which may be a website maintained by the model developer or a source code repository like SourceForge, JavaForge or Google Code. Online source code repositories (or project hosting sites) like these are free and provide developers with a number of useful tools for managing collaborative software development projects. CSDMS does not aim to compete with the services that these repositories provide (e.g. version control, issue tracking, wikis, online chat and web hosting).

Another key goal of the CSDMS project is to create a collection of open-source, earth-science modeling components that are designed so that they are relatively easy to reuse in new modeling projects. CSDMS has studied this problem and has examined a number of different technologies for addressing it. We have learned that there are certain fundamental design principles that are common to all of these model-coupling technologies. That is, there is a certain minimum amount of code refactoring that is necessary in order for a model to be usable as a "plug-and-play" component. This is encouraging, because CSDMS does not want to recommend any particular course of action or protocol to our code contributors unless there are sound reasons for doing so (such as broad applicability). The recommendations that we give here may be viewed as a distillation of what is common among the various approaches to model coupling.

Code must be in a Babel-supported language

The programming languages that are currently supported by Babel are: C, Fortran (77, 95, 2003), C++, Java and Python. We realize that there are also a significant number of models that are written in proprietary, array-based languages like MatLab and IDL (Interactive Data Language). CSDMS has identified an alpha version of an open-source software tool called I2PY that converts IDL to Python and has greatly extended it. This tool maps array-based IDL functions and procedures to similar ones (with similar performance) that are available in a Python module called NumPy (Numerical Python) and maps IDL plotting commands to similar ones in the Matplotlib module. I2PY is written in Python and is built upon standard Unix-based parsing tools such as yacc and lex. CSDMS hopes to leverage the I2PY conversion tool into a new tool, perhaps called M2PY, which can similarly convert MatLab code to Python.

Note that, in general, converting source code from one language to another is a tedious, time-consuming and error-prone process. It is not something that can be fully automated, even when the two languages are similar and conversion tools are used. For two languages that are quite different, the conversion process is even more complicated. CSDMS simply does not have the resources to do this. More importantly, however, converting source code to a new language leads to a situation where it is difficult for CSDMS to incorporate bug fixes and ongoing improvements to the model by its developer. Ideally, primary responsibility for a model (i.e. "problem ownership") should remain with the developer or team that created it, and they should be encouraged to use whatever language they prefer and are most productive in. However, by requiring relatively simple and one-time changes to how the source code is modularized, CSDMS gains the ability to apply an automated wrapping process to the latest stable version of the model and begin using it as a component.

Code must compile with a CSDMS-supported, open-source compiler

Supporting more than one compiler for each of the Babel-supported languages would require more resources than CSDMS has available with regard to testing and technical support. However, it is usually fairly straightforward for code developers to modify their own code so that it will compile with an open-source compiler like gcc. CSDMS intends to provide online tips and other resources to assist developers with this process.

Refactor source code to have an "IRF interface"

One "universal truth" of component-based programming is that in order for a model to be used as a component in another model, its interface must allow complete control to be handed to an external caller. Most earth-science models have to be initialized in some manner and then use time stepping or another form of stepping in order to compute a result. While time-stepping models are the most familiar, many other problems such as root-finding and relaxation methods employ some type of iteration or stepping. For maximum plug-and-play flexibility, it is necessary to make the actions that take place during a single step directly accessible to a caller.

To see why this is so, consider two time-stepping models. Suppose that Model A melts snow, routes the runoff to a lake, and increases the depth of the lake while Model B computes lake-level lowering due to evaporative loss. Each model initializes the lake depth, has its own time loop and changes the lake depth. If each of these models is written in the traditional manner, then combining them into a single model means that whatever happens inside the time loop of Model A must be pasted into Model B's time loop or vice versa. There can only be one time loop. This illustrates, in its simplest form, a very common problem that is encountered when linking models. Now imagine that we restructure the source code of both models slightly so that they each have their own Initialize(), Run_Step() and Finalize() subroutines. The Initialize routine contains all of the code that came before the time loop in the original model, the Run_Step() routine contains the code that was inside the time loop (and returns all updated variables) and the Finalize() routine contains the code that came after the time loop. Now suppose that we write one additional routine, perhaps called Run_Model() or Main(), which simply calls Initialize(), starts a time loop which calls Run_Step() and then calls Finalize(). Calling Run_Model() reproduces the functionality of the original model, so we have made a fairly simple, one-time change to our two models and retained the ability to use them in "stand-alone mode." Future enhancements to the model simply insert new code into this new set of four subroutines. However, this simple change means that, in effect, we have converted each model into an object with a standard set of four member functions or methods. Now, it is trivial to write a new model that combines the computations of Model A and B. This new model first calls the Initialize() methods of Model A and B, then starts a time loop, then calls the Run_Step() methods of Model A and B, and finally calls the Finalize() methods of Model A and B. For models written in an object-oriented language, these four subroutines would be methods of a class, but for other languages, like Fortran, it is enough to simply break the model into these subroutines. Object-oriented programming concepts are reviewed in the next section.

For lack of a better term, we refer to this Initialize(), Run_Step(), Finalize() pattern as an "IRF interface". This basic idea could be taken further by adding a Test() method. It is also helpful to have two additional methods, perhaps called Get_Input_Exchange_Item_List() and Get_Output_Exchange_Item_List(), that a caller can use to query what type of data the model is able to use as input or compute as output. While simple, these changes allow a caller to have fine-grained control over our model, and therefore use it in clever ways as part of something bigger. In essence, this set of methods is like a handheld remote control for our model. Compiling models in this form as shared objects (.so, Unix) or dynamically-linked libraries (.dll, Windows) is one way that they can then be used as plug-ins.

Provide complete descriptions of input and output "exchange items"

More complex models may involve large sets of input and output variables, which are referred to in OpenMI as "exchange items." For each input and output exchange item, we require that a few attributes be provided, such as the item's name, units and description. We also require a description of the computational grid (e.g. XY corner coordinates for every computational cell). CSDMS staff plans to develop an XML schema that can be used to provide this information in a standardized format. XML files of this type will be used by automated wrapping tools that convert IRF-form models into OpenMI-compliant components. CSDMS is currently developing these wrapping tools.

Include suitable testing procedures and data

In view of the previous requirement, it would be ideal if every model submission not only had an IRF interface but included one or more "self-tests" in the form of a member function. One of these self-tests could simply be a "sanity check" that operates on trivial input data (perhaps even hard-coded). When analytic solutions are available for a particular model, these make excellent self-tests because they can be used to check the accuracy and stability of any numerical methods that are used.

Include a user's guide or at least basic documentation

There is no substitute for good documentation. While good documentation is typically difficult to write, it does result in a net time savings for the model developer (by preventing technical support questions) and causes the model to become adopted more rapidly.

Specify what type of open-source license applies to your source code.

The CSDMS project is focused on open-source software, but there are now many different open-source license types to choose from, each differing with regard to the details of how others may use your code. The CSDMS Integration Office needs to know this information in order to respect your intellectual property rights. Rosen (2004) is a good, online and open-source book that explains open source licensing in detail. CSDMS requires that contributions have an open source license type that is compliant with the standard set forth by the Open Source Initiative (OSI).

Use standard or generic file formats whenever possible for input and output

XML and INI are two examples of widely-used file formats that are flexible and standardized. You can learn more about XML in the section of this handbook titled "What is XML?".

Apply a CSDMS automated wrapping tool

Apply a CSDMS automated wrapping tool to the IRF version of your model (and its XML metadata) to create a CSDMS-compliant component. CSDMS is currently developing these wrapping tools. Our approach is to use the OpenMI interface standard within a CCA-compliant model-coupling framework such as Ccaffeine. OpenMI and Ccaffeine are explained in subsequent sections.