CSN Overview

From CSDMS

  CSDMS Standard Names — Overview


Basic Rules

  • This section provides some basic rules but many additional rules and naming patterns are given in other sections as explained below.

  • Every standard name has an object part that describes a particular object and a quantity part that describes a particular attribute of that object that can be quantified with a number. Numerous templates, patterns and rules for constructing object names and quantity names are provided on the CSDMS Quantity Templates and CSDMS Object Templates pages. Quantity names are sometimes constructed using one of the CSDMS Process Names.

  • A standard name may have an optional operation prefix that ends with "_of". See the CSDMS Operation Templates page for more information.

  • Standard names consist of lower-case letters and digits. They contain no blank spaces. Underscores are the only non-alphanumeric character that is allowed in a standard name. All hyphens are converted to underscores.

  • A single underscore is used between separate words in either object names or quantity names. A double underscore is used between the object part and the quantity part of the name. (But this rule has not yet been used in the examples.)

  • Many CSDMS Standard Names contain a person's last name. If the last name ends with the letter "s" — as in Burgers, Gibbs, Jones, Reynolds, Shields and Stokes — then it is retained. However, a possessive "s" is never added to the name, so we would use "newton" vs. "newtons" in a standard name.

  • Approved acronyms may be included in standard names, but they are usually spelled out explicitly as in "counterclockwise" instead of "ccw". Standard symbols for the chemical elements (but lower-case, like "h" and "c") can be used in naming quantities like "bond_angle" that involve multiple atoms in a molecule. See Attributes of Molecules on the CSDMS Quantity Templates page. Other possible acronyms are: stp = standard temperature and pressure, toa = top of atmosphere (used in CF). The acronym "wrt" = "with respect to" is used in some operation templates.

  • As explained at the top of the CSDMS Process Names page, the "ing" ending on process names such as "shearing" and "melting" is often dropped for quantities like "shear_stress" and "melt_rate" that use the Process_name + Quantity Pattern. However, the "ing" ending may be retained when the same word is used in a quantity like "melting_point_temperature" (vs. "melt_temperature").

  • The rightmost word in an object name is the "base object" to which the quantity applies. If the rightmost word in a quantity name is a quantity suffix, then the base quantity is the last two words in the quantity name. Otherwise, the base quantity is the last word.

  • In general, the words in an object name or quantity name are ordered left to right from the general to the specific. Removing words from the left will often then result in another valid object or quantity. For example, "conductivity" => "hydraulic_conductivity" => "saturated_hydraulic_conductivity" => "effective_saturated_hydraulic_conductivity".

  • Some cases will most likely require new rules to avoid ambiguity. For example, should we use:
channel_bed_shear_stress
channel_bed_water_shear_stress
channel_water_at_bed_shear_stress or
channel_water_bed_shear_stress
Similarly, should we use:
axial_tilt_angle or axis_tilt_angle ?
channel_water_flow_speed or channel_water_speed ?
forest_area_fraction or forested_area_fraction ?


Background

  • The Semantic Web concept and movement recognizes that the already transformative World Wide Web will become even more powerful if it is extended beyond an interconnected set of human readable documents to a set of machine readable documents that are able to capture and convey knowledge.

  • The field of semantics is concerned with the study of meaning, and ontology is essentially concerned with capturing and organizing knowledge. In computer science, an ontology is a system that attempts to capture and organize knowledge in a particular domain (in machine readable form), as understood by experts in that domain or subject area.

  • There are a variety of concepts that fall under the banner of "semantics and ontology" that are used to address specific issues in the development of "intelligent software". Some of these are:

  controlled vocabulary
  corpus
  crosswalk
  data dictionary
  lingua franca
  master dictionary
  nomenclature
  ontology
  preferred label
  standard names
  taxonomy
  typology
  unique identifier

  • While there are subtle differences between the items in this list, they can be divided into two broad groups. The terms controlled vocabulary, crosswalk, lingua franca, nomenclature, preferred label and standard names are all closely related and have the fairly simple, linear structure of a list or lookup table. They are used primarily to map a term used in one setting to an equivalent term in another setting. Relationships between entries in the list are not of primary interest. The main interest is knowing whether two terms refer to the same object.

  • By contrast, the terms ontology, master dictionary, taxonomy and typology represent efforts to capture relationships between entries (objects). They attempt to organize the objects into a hierarchy, which may include nested classes (sub- and super-classes). The connectedness or closeness of objects is also of interest. Because of this, they have the potential to capture knowledge, which is broadly concerned with relationships and the degree to which objects are similar. Their fundamental structure is that of a graph (nodes connected by lines) instead of a list.

  • These two broad groups of tools are used to address three main software use cases, namely

  Semantic mediation and matching
  Discovery of related information
  Capture and archiving of domain knowledge

  The last two of these require tools from the second, more complex group.


Purpose

  • The CSDMS semantic "use case" is one of automated semantic mediation, matching or reconciliation. While our focus is on a "lingua franca", our standard names are often built from a hierarchical set of concepts and may eventually be used to construct a type of ontology.

  • The CSDMS plug-and-play modeling system requires a set of standard names for input and output variables (quantities) in order to automatically determine whether an input variable in one model (or database) is equivalent to (or compatible with) an output variable in another model (or database) for the purpose of coupling the two resources (as user and provider). There is no need or requirement for these standard names to be used within a model, and they are too long to be used in this way. However, CSDMS requires model contributors to implement the BMI (Basic Model Interface), and this includes mapping each of the model's input and output variables to a CSDMS Standard Name. In addition, contributors provide a Model Metadata File (MMF) that (1) specifies how each standard name is used within the model (e.g. units, how measured, etc.) and (2) describes other key attributes of the model that must be known to facilitate coupling to other models. See CSDMS BMI for more information. #######

  • Our focus is more on identifying general rules and patterns for consistent construction of standard names (i.e. a systematic naming scheme) that span the geosciences and less on creating an exhaustive list of names, which comes later. We have identified numerous patterns and templates that cover a broad range of needs and these are listed and discussed in the subsequent sections of this document. This includes numerous Object Templates, Quantity Templates and Operation Templates.

  • RDF (Resource Description Framework) is built around an "object + attribute + value" concept. Our "object + quantity" names follow a similar pattern and are used to retrieve values from a model or database . The word "attribute" is a more general term than "quantity"; the latter is essentially a type of attribute that can be described with numbers and has units.

  • Units are not given as part of the name, as with CF Standard Names. However, in CF Standard Names, a certain SI unit is often implied by the name. Also, the CF Standard Names allow inclusion of assumptions in the name, such as "_assuming_clear_sky". In CSDMS Standard Names, we use the name as a "key" or "index" to access not only the associated values but associated metadata that provides the units, set of assumptions, datum, how measured, etc. If all assumptions, etc. are included in the standard name, it limits the number of matches that are likely to be found during the discovery process or when trying to couple models. It also discourages a complete listing of the relevant assumptions. Metadata (including assumptions) can be used to distinguish between exact and approximate matches, and this information can be presented to users when desirable.

  • Guidelines for construction of CF Standard Names can be found at: CF Standard Name Guidelines. The rules for CSDMS Standard Names being developed here are meant to be more general, more rigorously defined and less ambiguous. As of 5/3/12, there are 2134 CF Standard Names, but the number of distinct patterns reflected in this set is much, much smaller. Some of them already conform to the patterns and templates of the CSDMS Standard Names and these will favored (or assimilated) whenever possible. However, CSDMS plans to provide a lookup table that maps each CF Convention Standard Name to a CSDMS Standard Name.