DataComponents: Difference between revisions

From CSDMS
No edit summary
m (Add link to Tian's paper)
 
(8 intermediate revisions by one other user not shown)
Line 3: Line 3:
As with models, data comes in many different flavors—different spatial and temporal resolutions, different grid types, different file formats—and, as with models, these differences pose significant hurdles when trying to analyze or bring data into a modeling framework. Given the growing interest in using real-world geospatial data with models, and the explosion of high-resolution datasets, this problem is pressing.  
As with models, data comes in many different flavors—different spatial and temporal resolutions, different grid types, different file formats—and, as with models, these differences pose significant hurdles when trying to analyze or bring data into a modeling framework. Given the growing interest in using real-world geospatial data with models, and the explosion of high-resolution datasets, this problem is pressing.  


Therefore, CSDMS developed a common language, by using the [[BMI|BMI]], that allows models to seamlessly communicate with data as well as with other models. Applied to data, the BMI acts as a common hub that connects spokes to the many data formats within the earth sciences.  
Therefore, CSDMS developed a common language, by using the [[BMI|BMI]], that allows models to seamlessly communicate with data as well as with other models. Applied to data, the BMI acts as a common hub that connects spokes to the many data formats within the earth sciences.


==Available data components==
For a detailed description of the design and use of data components, please see Gan ''et al.'' (2024): https://doi.org/10.5194/gmd-17-2165-2024.
CSDMS makes data components available for the community. These, '''{{#ask:[[Model:+]] [[Source code availability::Through web repository||Through CSDMS repository]] [[ModelDomain::Terrestrial||Coastal||Marine||Hydrology||Carbonates and Biogenics‎||Climate||Geodynamic]][[Source code availability::Through CSDMS repository||Through web repository]] [[DataComponent::Yes]]
 
==Available Data Components==
CSDMS makes Data Components available for the community. These, '''{{#ask:[[Model:+]] [[Source code availability::Through web repository||Through CSDMS repository]] [[ModelDomain::Terrestrial||Coastal||Marine||Hydrology||Carbonates and Biogenics‎||Climate||Geodynamic]][[Source code availability::Through CSDMS repository||Through web repository]] [[DataComponent::Yes]]
| format=count}}''' are described in the [[Model_download_portal|CSDMS repository]] and are listed below.
| format=count}}''' are described in the [[Model_download_portal|CSDMS repository]] and are listed below.


Line 25: Line 27:
<br>
<br>
Data Components are an element of the [[Workbench|CSDMS Workbench]], an integrated system of software tools, technologies, and standards for building and coupling models.<br>
Data Components are an element of the [[Workbench|CSDMS Workbench]], an integrated system of software tools, technologies, and standards for building and coupling models.<br>
If you want to add a data component to the list above, please fill out the [[Form:Module_questionnaire|form]].   
If you want to add a Data Component to the list above, please fill out the [[Form:Module_questionnaire|form]].   
<br><br>
<br>
 
==Contribute Data Components==
'''We encourage the community to develop new Data Components. Please follow the instructions below and [https://csdms.github.io/help-desk/ contact us]  in case you need any support.'''
 
Generally, a Data Component includes two elements: the BMI component and the Babelized component. The BMI component is a Python package to download the datasets and wrap them with BMI functions. The Babelized component is another Python package to convert the BMI component into a plug-and-play component for a specific modeling framework (e.g. pymt). The figure shows the contents and relationships between these two components and the Topography Data Component is taken as the example to demonstrate the implementation steps.
 
[[File:Dc example.png|center|700px|Fig. Elements of an example Data Component]]
<br>
 
'''Step 1: Implement the BMI component.'''
* Implement the Application Programming Interface (API) to download the datasets (e.g., Topography class in [https://github.com/csdms/bmi-topography/blob/main/bmi_topography/topography.py topography.py]).
* Implement the Command Line Interface (CLI) to allow downloading datasets through shell commands (e.g., [https://github.com/csdms/bmi-topography/blob/main/bmi_topography/cli.py cli.py]).
* Create a Python class to wrap the dataset with the BMI functions (e.g., BmiTopography class in [https://github.com/csdms/bmi-topography/blob/main/bmi_topography/bmi.py bmi.py])
* If there is already an API available, it is suggested to use the existing API and mainly implement the Python class to wrap the datasets with the BMI functions.
* Examples: [https://github.com/csdms/bmi-topography bmi-topography], [https://github.com/csdms/bmi-wavewatch3 bmi_wavewatch3]
<br>
'''Step 2: Implement the Babelized component.'''
* Run the babelizer over the BMI component to create a Python package that can be imported into pymt.
* Learn more about the babelizer from its [https://babelizer.readthedocs.io/ documentation] and about pymt from its [https://pymt.readthedocs.io/ documentation].
* Examples: [https://github.com/pymt-lab/pymt_topography pymt_topography], [https://github.com/pymt-lab/pymt_wavewatch3 pymt_wavewatch3]
<br>
'''Step 3: Create documentations.'''
* We recommend creating documentations for the BMI and Babelized components. They may include a README file, [https://www.sphinx-doc.org/en/master/ Sphinx documentation], and [https://jupyter-notebook.readthedocs.io/en/latest/ Jupyter notebook] tutorials.
* Examples: [https://github.com/csdms/bmi_era5 bmi_era5], [https://github.com/pymt-lab/pymt_era5 pymt_era5] (see README file, ‘docs’ folder, and ‘notebooks’ folder)
<br>
'''Step 4: Create conda package.'''
* We recommend creating a conda-forge package for the BMI and Babelized components.
* The Conda-forge documentation on [https://conda-forge.org/docs/maintainer/adding_pkgs.html contributing packages] is essential reading.
* Examples: [https://github.com/conda-forge/bmi-topography-feedstock bmi-topography-feedstock], [https://github.com/conda-forge/pymt_topography-feedstock pymt_topography-feedstock]
<br>
'''Step5: Code review.'''
* We recommend having a code review for the Data Component.
* Please [https://csdms.github.io/help-desk/ contact us] to request a code review if needed. 
<br>
'''Step 6: Share the Data Component.'''
* If you want to add your Data Component to the list above, please fill out the [[Form:Module_questionnaire|form]]. This will help the community members discover and use it.
* Example: [https://csdms.colorado.edu/wiki/Model:Topography_Data_Component Topography Data Component]

Latest revision as of 14:27, 25 March 2024

Data Components

Introduction

As with models, data comes in many different flavors—different spatial and temporal resolutions, different grid types, different file formats—and, as with models, these differences pose significant hurdles when trying to analyze or bring data into a modeling framework. Given the growing interest in using real-world geospatial data with models, and the explosion of high-resolution datasets, this problem is pressing.

Therefore, CSDMS developed a common language, by using the BMI, that allows models to seamlessly communicate with data as well as with other models. Applied to data, the BMI acts as a common hub that connects spokes to the many data formats within the earth sciences.

For a detailed description of the design and use of data components, please see Gan et al. (2024): https://doi.org/10.5194/gmd-17-2165-2024.

Available Data Components

CSDMS makes Data Components available for the community. These, 8 are described in the CSDMS repository and are listed below.

ProgramDescriptionDeveloperDownloadPyMT
DbSEABED Data Component A CSDMS data component used to download the marine substrates datasets from the dbSEABED system. Gan, Tian
redirect download
ERA5 Data Component A CSDMS data component used to download the ECMWF Reanalysis v5 (ERA5) datasets Gan, Tian
download
GeoTiff Data Component
A.k.a. GeoTiff, bmi-geotiff
A CSDMS data component for accessing data and metadata from a GeoTIFF file, through either a local filepath or a remote URL.. Piper, Mark
redirect download
Checkmark 32.png
GridMET Data Component
A.k.a. gridMET, gridmet_bmi
A CSDMS data component for fetching and caching gridMET meteorological data. McDonald, Rich
redirect download
Checkmark 32.png
NWIS Data Component A CSDMS data component used to download the National Water Information System (Nwis) time series datasets. Gan, Tian
redirect download
ROMS Data Component A CSDMS data component used to access the Regional Ocean Modeling System (ROMS) datasets Gan, Tian
redirect download
SoilGrids Data Component A CSDMS data component used to download the soil property datasets from the SoilGrids system. Gan, Tian
redirect download
Topography Data Component
A.k.a. Topography, bmi-topography
A CSDMS data component used to fetch and cache NASA Shuttle Radar Topography Mission (SRTM) and JAXA Advanced Land Observing Satellite (ALOS) land elevation data using the OpenTopography REST API. Piper, Mark
redirect download
Checkmark 32.png


Data Components are an element of the CSDMS Workbench, an integrated system of software tools, technologies, and standards for building and coupling models.
If you want to add a Data Component to the list above, please fill out the form.

Contribute Data Components

We encourage the community to develop new Data Components. Please follow the instructions below and contact us in case you need any support.

Generally, a Data Component includes two elements: the BMI component and the Babelized component. The BMI component is a Python package to download the datasets and wrap them with BMI functions. The Babelized component is another Python package to convert the BMI component into a plug-and-play component for a specific modeling framework (e.g. pymt). The figure shows the contents and relationships between these two components and the Topography Data Component is taken as the example to demonstrate the implementation steps.

Fig. Elements of an example Data Component


Step 1: Implement the BMI component.

  • Implement the Application Programming Interface (API) to download the datasets (e.g., Topography class in topography.py).
  • Implement the Command Line Interface (CLI) to allow downloading datasets through shell commands (e.g., cli.py).
  • Create a Python class to wrap the dataset with the BMI functions (e.g., BmiTopography class in bmi.py)
  • If there is already an API available, it is suggested to use the existing API and mainly implement the Python class to wrap the datasets with the BMI functions.
  • Examples: bmi-topography, bmi_wavewatch3


Step 2: Implement the Babelized component.


Step 3: Create documentations.


Step 4: Create conda package.


Step5: Code review.

  • We recommend having a code review for the Data Component.
  • Please contact us to request a code review if needed.


Step 6: Share the Data Component.

  • If you want to add your Data Component to the list above, please fill out the form. This will help the community members discover and use it.
  • Example: Topography Data Component