CSN Overview

From CSDMS

  CSDMS Standard Names — Overview


Basic Rules

  • This section provides some basic rules but many additional rules and naming patterns are given in other sections as explained below.

  • Every standard name has an object part that describes a particular object and a quantity part that describes a particular measurable attribute of that object that has units. Numerous templates, patterns and rules for constructing object names and quantity names are provided on the CSDMS Quantity Templates and CSDMS Object Templates pages. Quantity names are sometimes constructed using one of the CSDMS Process Names.

  • A standard name may have an optional operation prefix that ends with "_of". See the CSDMS Operation Templates page for more information.

  • Standard names consist of lower-case letters and digits. They contain no blank spaces. Underscores are the only non-alphanumeric character that is allowed in a standard name. All hyphens are converted to underscores.

  • A single underscore is used between separate words in either object names or quantity names. (But this rule has not yet been used in the examples.) A double underscore is used between the object part and the quantity part of the name.

  • Many CSDMS Standard Names contain a person's last name. If the last name ends with the letter "s" — as in Burgers, Gibbs, Jones, Reynolds, Shields and Stokes — then it is retained. However, a possessive "s" is never added to the name, so we would use "newton" vs. "newtons" in a standard name.

  • Approved acronyms may be included in standard names, but they are usually spelled out explicitly as in "counterclockwise" instead of "ccw". Standard symbols for the chemical elements (but lower-case, like "h" and "c") can be used in naming quantities like "bond_angle" that involve multiple atoms in a molecule. See Attributes of Molecules on the CSDMS Quantity Templates page.

  • As explained at the top of the CSDMS Process Names page, the "ing" ending on process names such as "shearing" and "melting" is often dropped for quantities like "shear_stress" and "melt_rate" that use the Process_name + Quantity Pattern. However, the "ing" ending may be retained when the same word is used in a quantity like "melting_point_temperature" (vs. "melt_temperature").


Background

  • The Semantic Web concept and movement recognizes that the already transformative World Wide Web will become even more powerful if it is extended beyond an interconnected set of human readable documents to a set of machine readable documents that are able to capture and convey knowledge.

  • The field of semantics is concerned with the study of meaning, and ontology is essentially concerned with capturing and organizing knowledge. In computer science, an ontology is a system that attempts to capture and organize knowledge in a particular domain (in machine readable form), as understood by experts in that domain or subject area.

  • There are a variety of concepts that fall under the banner of "semantics and ontology" that are used to address specific issues in the development of "intelligent software". Some of these are:

  controlled vocabulary
  corpus
  crosswalk
  data dictionary
  lingua franca
  master dictionary
  nomenclature
  ontology
  preferred label
  standard names
  taxonomy
  typology
  unique identifier

  • While there are subtle differences between the items in this list, they can be divided into two broad groups. The terms controlled vocabulary, crosswalk, lingua franca, nomenclature, preferred label and standard names are all closely related and have the fairly simple, linear structure of a list or lookup table. They are used primarily to map a term used in one setting to an equivalent term in another setting. Relationships between entries in the list are not of primary interest. The main interest is knowing whether two terms refer to the same object.

  • By contrast, the terms ontology, master dictionary, taxonomy and typology represent efforts to capture relationships between entries (objects). They attempt to organize the objects into a hierarchy, which may include nested classes (sub- and super-classes). The connectedness or closeness of objects is also of interest. Because of this, they have the potential to capture knowledge, which is broadly concerned with relationships and the degree to which objects are similar. Their fundamental structure is that of a graph (nodes connected by lines) instead of a list.

  • These two broad groups of tools are used to address three main software use cases, namely

  Semantic mediation and matching
  Discovery of related information
  Capture and archiving of domain knowledge

  The last two of these require tools from the second, more complex group.


Purpose

  • The CSDMS semantic "use case" is one of automated semantic mediation, matching or reconciliation. While our focus is on a "lingua franca", our standard names are often built from a hierarchical set of concepts and may eventually be used to construct a type of ontology.

  • The CSDMS plug-and-play modeling system requires a set of standard names for input and output variables (quantities) in order to automatically determine whether an input variable in one model (or database) is equivalent to (or compatible with) an output variable in another model (or database) for the purpose of coupling the two resources (as user and provider). There is no need or requirement for these standard names to be used within a model, and they are too long to be used in this way. However, CSDMS requires model contributors to implement the BMI (Basic Model Interface), and this includes mapping each of the model's input and output variables to a CSDMS Standard Name. In addition, contributors provide a Model Metadata File (MMF) that (1) specifies how each standard name is used within the model (e.g. units, how measured, etc.) and (2) describes other key attributes of the model that must be known to facilitate coupling to other models. See CSDMS BMI for more information. #######

  • Our focus is more on identifying general rules and patterns for consistent construction of standard names (i.e. a systematic naming scheme) that span the geosciences and less on creating an exhaustive list of names, which comes later. We have identified numerous patterns and templates that cover a broad range of needs and these are listed and discussed in the subsequent sections of this document. This includes numerous Object Templates, Quantity Templates and Operation Templates.

  • Each standard name must have an object part and a quantity part, with adjectives and modifiers (as prefixes) being used to help avoid ambiguity and identify a specific object and associated quantity. It may also have an optional operation prefix.

  • RDF (Resource Description Framework) is built around an "object + attribute + value" concept. Our "object + quantity" names follow a similar pattern and are used to retrieve values from a model or database . The word "attribute" is a more general term than "quantity"; the latter is essentially a type of attribute that can be described with numbers and has units.

  • Units are not given as part of the name, as with CF Standard Names. However, in CF Standard Names, a certain SI unit is often implied by the name. Also, the CF Standard Names allow inclusion of assumptions in the name, such as "_assuming_clear_sky". In CSDMS Standard Names, we use the name as a "key" or "index" to access not only the associated values but associated metadata that provides the units, set of assumptions, datum, how measured, etc. If all assumptions, etc. are included in the standard name, it limits the number of matches that are likely to be found during the discovery process or when trying to couple models. It also discourages a complete listing of the relevant assumptions. Metadata (including assumptions) can be used to distinguish between exact and approximate matches, and this information can be presented to users when desirable.

  • Guidelines for construction of CF Standard Names can be found at: CF Standard Name Guidelines. The rules for CSDMS Standard Names being developed here are meant to be more general, more rigorously defined and less ambiguous. As of 5/3/12, there are 2134 CF Standard Names, but the number of distinct patterns reflected in this set is much, much smaller. Some of them already conform to the patterns and templates of the CSDMS Standard Names and these will favored (or assimilated) whenever possible. However, CSDMS plans to provide a lookup table that maps each CF Convention Standard Name to a CSDMS Standard Name.

  • The CF Standard Names promote the importance of specifying a surface. We could use an "object-at-location_object" pattern with "_at_" as another reserved word for this purpose. Possible examples are:
air_at_toa_temperature
channel_water_at_bed_pressure
channel_water_at_bed_temperature
earth_atmosphere_at_land_surface_pressure
water_at_land_surface_infiltration_rate
  
The CF Standard Names use the abbreviation "toa" for top of atmosphere.