DataComponents

From CSDMS
Data Components

Introduction

As with models, data comes in many different flavors—different spatial and temporal resolutions, different grid types, different file formats—and, as with models, these differences pose significant hurdles when trying to analyze or bring data into a modeling framework. Given the growing interest in using real-world geospatial data with models, and the explosion of high-resolution datasets, this problem is pressing.

Therefore, CSDMS developed a common language, by using the BMI, that allows models to seamlessly communicate with data as well as with other models. Applied to data, the BMI acts as a common hub that connects spokes to the many data formats within the earth sciences.

For a detailed description of the design and use of data components, please see Gan et al. (2024): https://doi.org/10.5194/gmd-17-2165-2024.

Available Data Components

CSDMS makes Data Components available for the community. These, 8 are described in the CSDMS repository and are listed below.

ProgramDescriptionDeveloperDownloadPyMT
DbSEABED Data Component A CSDMS data component used to download the marine substrates datasets from the dbSEABED system. Gan, Tian
redirect download
ERA5 Data Component A CSDMS data component used to download the ECMWF Reanalysis v5 (ERA5) datasets Gan, Tian
download
GeoTiff Data Component
A.k.a. GeoTiff, bmi-geotiff
A CSDMS data component for accessing data and metadata from a GeoTIFF file, through either a local filepath or a remote URL.. Piper, Mark
redirect download
Checkmark 32.png
GridMET Data Component
A.k.a. gridMET, gridmet_bmi
A CSDMS data component for fetching and caching gridMET meteorological data. McDonald, Rich
redirect download
Checkmark 32.png
NWIS Data Component A CSDMS data component used to download the National Water Information System (Nwis) time series datasets. Gan, Tian
redirect download
ROMS Data Component A CSDMS data component used to access the Regional Ocean Modeling System (ROMS) datasets Gan, Tian
redirect download
SoilGrids Data Component A CSDMS data component used to download the soil property datasets from the SoilGrids system. Gan, Tian
redirect download
Topography Data Component
A.k.a. Topography, bmi-topography
A CSDMS data component used to fetch and cache NASA Shuttle Radar Topography Mission (SRTM) and JAXA Advanced Land Observing Satellite (ALOS) land elevation data using the OpenTopography REST API. Piper, Mark
redirect download
Checkmark 32.png


Data Components are an element of the CSDMS Workbench, an integrated system of software tools, technologies, and standards for building and coupling models.
If you want to add a Data Component to the list above, please fill out the form.

Contribute Data Components

We encourage the community to develop new Data Components. Please follow the instructions below and contact us in case you need any support.

Generally, a Data Component includes two elements: the BMI component and the Babelized component. The BMI component is a Python package to download the datasets and wrap them with BMI functions. The Babelized component is another Python package to convert the BMI component into a plug-and-play component for a specific modeling framework (e.g. pymt). The figure shows the contents and relationships between these two components and the Topography Data Component is taken as the example to demonstrate the implementation steps.

Fig. Elements of an example Data Component


Step 1: Implement the BMI component.

  • Implement the Application Programming Interface (API) to download the datasets (e.g., Topography class in topography.py).
  • Implement the Command Line Interface (CLI) to allow downloading datasets through shell commands (e.g., cli.py).
  • Create a Python class to wrap the dataset with the BMI functions (e.g., BmiTopography class in bmi.py)
  • If there is already an API available, it is suggested to use the existing API and mainly implement the Python class to wrap the datasets with the BMI functions.
  • Examples: bmi-topography, bmi_wavewatch3


Step 2: Implement the Babelized component.


Step 3: Create documentations.


Step 4: Create conda package.


Step5: Code review.

  • We recommend having a code review for the Data Component.
  • Please contact us to request a code review if needed.


Step 6: Share the Data Component.

  • If you want to add your Data Component to the list above, please fill out the form. This will help the community members discover and use it.
  • Example: Topography Data Component