Querying the CSDMS model repository

From CSDMS
Querying the CSDMS model repository

Semantic MediaWiki (SMW) is the knowledge management system used on the CSDMS website. SMW has an API with several actions, allowing users to add, edit, and query information. Here, we'll focus on the ask action, and the Ask API, to query metadata from the CSDMS model metadata repository.

The base URL for any call to the SMW API on the CSDMS website is httpw://csdms.colorado.edu/csdms_wiki/api.php.


Query syntax

The ask action supports one parameter, query, which takes an urlencoded string. The query is written in the SMW query language. A query consists of a series of conditions, which describe the search. Conditions are built from properties and values. For example, the condition

[[Programming language::C]]

would query for all models with the Programming language property that have a value of C. Note that the colons :: in the condition are literal in the query language, and cannot be urlencoded. Spaces, however, should be encoded with %20 or +, while brackets [] may optionally be encoded.

Try this condition in a query:

https://csdms.colorado.edu/csdms_wiki/api.php?action=ask&query=[[Programming+language::C]]&format=json

The results of a query are returned as JSON with a specified format. A query result can also be viewed in pretty print form by changing the value of the format parameter to jsonfm.


Properties

Properties are the basic data type of SMW. They consist of a name and a value, both of which are case-sensitive.

A defined set of properties are added to each model by the CSDMS WikiSysop. For example, Programming language is a property of models in the CSDMS model metadata repository.

Note: I desire a query that returns all the properties of a model, but I haven't figured out how to make it. It's on my list of unanswered questions below. In lieu of a programmatic query, I've been looking at the model's wiki source; for example, the Wikitext for HydroTrend.


Categories

Categories are tags added to a page by the CSDMS WikiSysop to aid in classification. Like properties, categories can be queried. For example, the condition

[[Category:Terrestrial]]

will list all terrestrial models the CSDMS model metadata repository. Unlike properties, only one colon : separates the category name and value.

Model is itself a category in the CSDMS wiki. Search for a particular model by name:

[[Model:HydroTrend]]

The category value is case-sensitive; e.g., hydrotrend wouldn't match a model. Here's this condition in a query:

https://csdms.colorado.edu/csdms_wiki/api.php?action=ask&query=[[Model:HydroTrend]]&format=json

Model keywords

Model keywords are defined not by SMW or the CSDMS WikiSysop, but by the developer of a model, so they may inconsistently vary from model to model. For example, the condition

[[Model keywords::basin]]

can be used to find all models that contain the keyword basin. Use this condition in a query:

https://csdms.colorado.edu/csdms_wiki/api.php?action=ask&query=[[Model+keywords::basin]]&format=jsonfm

Advanced queries

The Ask API supports a number of advanced query options.

Limiting the displayed results

By default, only the first 10 matches to a query are returned. To raise this limit, set the limit display property to a larger number. For example, in applying this to the example above

[[Programming language::C]]|limit=10000

we see that there are (at the time of writing this article) actually 100 models written in C. Note the use of the pipe character | to set off the display property from the condition. Here's the query:

https://csdms.colorado.edu/csdms_wiki/api.php?action=ask&query=[[Programming+language::C]]|limit=10000&format=jsonfm

Combining conditions

Conditions listed in serial are combined with a logical AND. For example, the two conditions

[[Programming language::C++]]
[[Last name::Tucker]]

can be combined into a single query with:

https://csdms.colorado.edu/csdms_wiki/api.php?action=ask&query=[[Programming+language::C%2B%2B]][[Last+name::Tucker]]&format=jsonfm

Note that spaces in the properties need to be urlencoded (here, with +), as well as the plus signs in C++ (here, with %2B)!

Conditions can support multiple values combined with a logical OR operation using the double pipe || operator. For example, to list models written in either Fortran 77 or Fortran 90, use the condition

[[Programming language::Fortran77||Fortran90]]

in a query this is:

https://csdms.colorado.edu/csdms_wiki/api.php?action=ask&query=[[Programming+language::Fortran77||Fortran90]]|limit=10000&format=jsonfm

See Help:Selecting_pages for more examples of disjunctions and comparisons of conditionals.

Specifying additional data

Additional data can be returned with a query result by specifying additional properties in query string. Separate additional properties with the pipe and question mark characters |?. For example, to find all models written by the user with the last name "Hutton", and also include, if available, the DOI and the source code repository for each model found, use the query string:

[[Last+name::Hutton]]|?DOI+model|?Source+web+address

The API call is:

https://csdms.colorado.edu/csdms_wiki/api.php?action=ask&query=[[Last+name::Hutton]]|?DOI+model|?Source+web+address&format=jsonfm

See Help:Inline_queries for more information on building query strings with several properties.

Testing queries

Test queries with the Special:Ask page on the CSDMS portal:

https://csdms.colorado.edu/wiki/Special:Ask.

In addition to interactively running queries, the Special:Ask page shows the raw query string, which can be helpful for building new queries programmatically.


Examples of queries

Here are some examples of queries into the CSDMS model repository.

Description Query URL
List all models created by the user with the last name Tucker https://csdms.colorado.edu/csdms_wiki/api.php?action=ask&query=[[Last+name::Tucker]]&format=json
List all models written in C https://csdms.colorado.edu/csdms_wiki/api.php?action=ask&query=[[Programming%20language::C]]&format=json
Really list all models written in C https://csdms.colorado.edu/csdms_wiki/api.php?action=ask&query=[[Programming%20language::C]]|limit=10000&format=json
List all models from user Tucker written in C https://csdms.colorado.edu/csdms_wiki/api.php?action=ask&query=[[Last+name::Tucker]][[Programming+language::C]]&format=json
List the first three models written by user Hutton https://csdms.colorado.edu/csdms_wiki/api.php?action=ask&query=[[Last+name::Hutton]]|limit=3&format=json
List five models written in C, starting at item 20 from the full list https://csdms.colorado.edu/csdms_wiki/api.php?action=ask&query=[[Programming+language::C]]|limit=5|offset=20
Search for models written a nonexistent programming language to see an error message https://csdms.colorado.edu/csdms_wiki/api.php?action=ask&query=[[Programming+language::xxyyzz]]&format=jsonfm
Find terrestrial models https://csdms.colorado.edu/csdms_wiki/api.php?action=ask&query=[[Category:Terrestrial]]&format=jsonfm
Locate a model by name https://csdms.colorado.edu/csdms_wiki/api.php?action=ask&query=[[Model:HydroTrend]]&format=json
List all models written in Fortran 77 or Fortran 90 https://csdms.colorado.edu/csdms_wiki/api.php?action=ask&query=[[Programming%20language::Fortran77||Fortran90]]|limit=10000&format=json
Find all models written by user Hutton, including (if available) the DOI and the source code repository for each model https://csdms.colorado.edu/csdms_wiki/api.php?action=ask&query=[[Last+name::Hutton]]|?DOI+model|?Source+web+address&format=jsonfm
List all models (Category technique) depreciated https://csdms.colorado.edu/csdms_wiki/api.php?action=ask&query=[[Category:Terrestrial||Coastal||Marine||Hydrology||Carbonates and Biogenics||Climate]]|limit=10000&format=jsonfm

Python examples of these queries (and others) can be found in the GitHub repository https://github.com/csdms/ask-api-examples.


Unanswered questions

  1. How does one get a list of all model properties used in the CSDMS wiki?
  2. How can one show the data for all the properties of a particular model?


Additional references