CSN Overview: Difference between revisions

From CSDMS
(deleted bad link)
 
(19 intermediate revisions by one other user not shown)
Line 1: Line 1:
=   '''[[CSDMS_Standard_Names | CSDMS Standard Names]]  — Overview''' =
=   '''[[CSDMS_Standard_Names | CSDMS Standard Names]]  — Overview''' =
:<br/>
<!-- ============================================= -->
== {{ Bar Heading| text=Basic Rules}} ==
:
:
* This section provides some basic rules but many additional rules and naming patterns are given in other sections as explained below. <br/> <br/>
* Every standard name has an '''object''' part that describes a particular object and a '''quantity''' part that describes a particular measurable attribute of that object that has units.  Numerous templates, patterns and rules for constructing object names and quantity names are provided on the [[CSN_Quantity_Templates | '''CSDMS Quantity Templates''']] and  [[CSN_Object_Templates | '''CSDMS Object Templates''']] pages. Quantity names are sometimes constructed using one of the [[CSN_Process_Names | '''CSDMS Process Names''']]. <br/> <br/>
* A standard name may have an optional '''operation''' prefix that ends with "_of". See the [[CSN_Operation_Templates | '''CSDMS Operation Templates''']] page for more information. <br/> <br/>
* Standard names consist of lower-case letters and digits.  They contain no blank spaces. Underscores are the only non-alphanumeric character that is allowed in a standard name.  All hyphens are converted to underscores. <br/> <br/>
* A single underscore is used between separate words in either object names or quantity names.  (But this rule has not yet been used in the examples.) A double underscore is used between the object part and the quantity part of the name. <br/> <br/>
* There are several short '''''reserved words''''' such as "of", "in", "on" (or and?), "at" and "to" that play specific roles to deal with issues that are described in [[CSN_Object_Templates | '''CSDMS Object Templates''']], [[CSN_Quantity_Templates | '''CSDMS Quantity Templates''']] and [[CSN_Operation_Templates | '''CSDMS Operation Templates''']]. <br/> <br/>
* Many CSDMS Standard Names contain a person's last name.  If the last name ends with the letter "s" &mdash; as in [http://en.wikipedia.org/wiki/Johannes_Martinus_Burgers Burgers], [http://en.wikipedia.org/wiki/J._Willard_Gibbs Gibbs], [http://en.wikipedia.org/wiki/Robert_Clark_Jones Jones], [http://en.wikipedia.org/wiki/Osborne_Reynolds Reynolds], [http://en.wikipedia.org/wiki/Shields_parameter Shields] and [http://en.wikipedia.org/wiki/George_Gabriel_Stokes Stokes] &mdash; then it is retained. However, a possessive "s" is never added to the name, so we would use "newton" vs. "newtons" in a standard name. <br/> <br/>
* Approved acronyms may be included in standard names, but they are usually spelled out explicitly as in "counterclockwise" instead of "ccw".  Standard symbols for the chemical elements (but lower-case, like "h" and "c") can be used in naming quantities like "bond_angle" that involve multiple atoms in a molecule.  See Attributes of Molecules on the [[CSN_Quantity_Templates | '''CSDMS Quantity Templates''']] page. Other possible acronyms are: stp = standard temperature and pressure, toa = top of atmosphere (used in CF).  The acronym "wrt" = "with respect to" is used in some operation templates. <br/> <br/>
* As explained at the top of the [[CSN_Process_Names | '''CSDMS Process Names''']] page, the "ing" ending on process names such as "shearing" and "melting" is often dropped for quantities like "shear_stress" and "melt_rate" that use the '''Process_name + Quantity Pattern'''.  However, the "ing" ending may be retained when the same word is used in a quantity like "melting_point_temperature" (vs. "melt_temperature").
<br/>
<!-- ============================================= -->
<!-- ============================================= -->
== {{ Bar Heading| text=Background}} ==
== {{ Bar Heading| text=Background}} ==
:
:
Line 58: Line 35:
<br/>
<br/>
&nbsp; The last two of these require tools from the second, more complex group.
&nbsp; The last two of these require tools from the second, more complex group.
* See: [http://en.wikipedia.org/wiki/Lingua_franca Lingua franca], [http://en.wikipedia.org/wiki/Nomenclature Nomenclature], [http://en.wikipedia.org/wiki/Semantic_translation Semantic translation] and [http://en.wikipedia.org/wiki/Text_corpus Text corpus]. <br/> <br/>
* The vocabulary of a well-educated person contains on the order of 50,000 to 100,000 words.  See this [http://news.bbc.co.uk/2/hi/uk_news/magazine/8013859.stm BBC article].


<br/>
<br/>
<!-- ============================================= -->
<!-- ============================================= -->
== {{ Bar Heading| text=Purpose}} ==
== {{ Bar Heading| text=Purpose}} ==
:
:
* The CSDMS semantic "use case" is one of automated semantic mediation, matching or reconciliation. While our focus is on a "lingua franca", our standard names are often built from a hierarchical set of concepts and may eventually be used to construct a type of ontology. <br/> <br/>
* The CSDMS semantic "use case" is one of automated semantic mediation, matching or reconciliation. While our focus is on a "lingua franca", our standard names are often built from a hierarchical set of concepts and may eventually be used to construct a type of ontology. <br/> <br/>


* The CSDMS plug-and-play modeling system requires a set of '''''standard names''''' for input and output variables (quantities) in order to ''automatically'' determine whether an input variable in one model (or database) is equivalent to (or compatible with) an output variable in another model (or database) for the purpose of coupling the two resources (as user and provider). There is no need or requirement for these standard names to be used '''within''' a model, and they are too long to be used in this way.  However, CSDMS requires model contributors to implement the BMI (Basic Model Interface), and this includes mapping each of the model's input and output variables to a CSDMS Standard Name.  In addition, contributors provide a Model Metadata File (MMF) that (1) specifies how each standard name is used within the model (e.g. units, how measured, etc.) and (2) describes other key attributes of the model that must be known to facilitate coupling to other models.  See CSDMS BMI for more information. #######  <br/> <br/>
* The CSDMS plug-and-play modeling system requires a set of '''''standard names''''' for input and output variables (quantities) in order to ''automatically'' determine whether an input variable in one model (or database) is equivalent to (or compatible with) an output variable in another model (or database) for the purpose of coupling the two resources (as user and provider). There is no need or requirement for these standard names to be used '''within''' a model, and they are too long to be used in this way.  However, CSDMS requires model contributors to implement the BMI (Basic Model Interface), and this includes mapping each of the model's input and output variables to a CSDMS Standard Name.  In addition, contributors provide a Model Metadata File (MMF) that (1) specifies how each standard name is used within the model (e.g. units, assumptions, etc.) and (2) describes other key attributes of the model that must be known to facilitate coupling to other models.  See [[BMI_Description | '''CSDMS Basic Model Interface''']] for more information. <br/> <br/>


* Our focus is more on identifying general rules and patterns for consistent construction of standard names (i.e. a systematic naming scheme) that span the geosciences and less on creating an exhaustive list of names, which comes later. We have identified numerous '''patterns''' and '''templates''' that cover a broad range of needs and these are listed and discussed in the subsequent sections of this document.  This includes numerous [[CSN_Object_Templates | '''Object Templates''']],  [[CSN_Quantity_Templates | '''Quantity Templates''']] and [[CSN_Operation_Templates | '''Operation Templates''']]. <br/> <br/>
* Our focus is more on identifying general rules and patterns for consistent construction of standard names (i.e. a systematic naming scheme) that span the geosciences and less on creating an exhaustive list of names, which comes later. We have identified numerous '''patterns''' and '''templates''' that cover a broad range of needs and these are listed and discussed in the subsequent sections of this document.  This includes numerous [[CSN_Object_Templates | '''Object Templates''']],  [[CSN_Quantity_Templates | '''Quantity Templates''']] and [[CSN_Operation_Templates | '''Operation Templates''']]. <br/> <br/>


* Each standard name must have an '''object''' part and a '''quantity''' part, with adjectives and modifiers (as prefixes) being used to help avoid ambiguity and identify a specific object and associated quantity. It may also have an optional '''operation prefix'''. <br/> <br/>
* RDF (Resource Description Framework) is built around an "object + attribute + value" concept.  Our "object + quantity" names follow a similar pattern and are used to retrieve the values from a model or database .  The word "attribute" is a more general term than "quantity";  the latter is essentially a type of attribute that can be described with numbers and has units. <br/> <br/>
 
* RDF (Resource Description Framework) is built around an "object + attribute + value" concept.  Our "object + quantity" names follow a similar pattern and are used to retrieve values from a model or database .  The word "attribute" is a more general term than "quantity";  the latter is essentially a type of attribute that can be described with numbers and has units. <br/> <br/>


* Units are not given as part of the name, as with CF Standard Names. However, in CF Standard Names, a certain SI unit is often implied by the name.  Also, the CF Standard Names allow inclusion of assumptions in the name, such as "_assuming_clear_sky".  In CSDMS Standard Names, we use the name as a "key" or "index" to access not only the associated values but associated metadata that provides the units, set of assumptions, datum, how measured, etc.  If all assumptions, etc. are included in the standard name, it limits the number of matches that are likely to be found during the discovery process or when trying to couple models.  It also discourages a complete listing of the relevant assumptions. Metadata (including assumptions) can be used to distinguish between exact and approximate matches, and this information can be presented to users when desirable. <br/> <br/>
* Units are not given as part of the name, as with CF Standard Names. However, in CF Standard Names, a certain SI unit is often implied by the name.  Also, the CF Standard Names allow inclusion of assumptions in the name, such as "_assuming_clear_sky".  In CSDMS Standard Names, we use the name as a "key" or "index" to access not only the associated values but associated metadata that provides the units, set of assumptions, datum, how measured, etc.  If all assumptions, etc. are included in the standard name, it limits the number of matches that are likely to be found during the discovery process or when trying to couple models.  It also discourages a complete listing of the relevant assumptions. Metadata (including assumptions) can be used to distinguish between exact and approximate matches, and this information can be presented to users when desirable. <br/> <br/>


* Guidelines for construction of CF Standard Names can be found at: [http://cf-pcmdi.llnl.gov/documents/cf-standard-names/guidelines '''CF Standard Name Guidelines'''].  The rules for CSDMS Standard Names being developed here are meant to be more general, more rigorously defined and less ambiguous. As of 5/3/12, there are 2134 CF Standard Names, but the number of distinct patterns reflected in this set is much, much smaller.  Some of them already conform to the patterns and templates of the CSDMS Standard Names and these will favored (or assimilated) whenever possible.  However, CSDMS plans to provide a lookup table that maps each CF Convention Standard Name to a CSDMS Standard Name. <br/> <br/>
* Guidelines for construction of CF Standard Names can be found at CF Standard Name Guidelines.  The rules for CSDMS Standard Names being developed here are meant to be more general, more rigorously defined and less ambiguous. As of 5/3/12, there are 2134 CF Standard Names, but the number of distinct patterns reflected in this set is much, much smaller.  Some of them already conform to the patterns and templates of the CSDMS Standard Names and these will be favored (or assimilated) whenever possible.  However, CSDMS plans to provide a lookup table that maps each CF Convention Standard Name to a CSDMS Standard Name. <br/> <br/>


:
:
:
:
:
:

Latest revision as of 07:49, 20 July 2016

  CSDMS Standard Names — Overview

Background

  • The Semantic Web concept and movement recognizes that the already transformative World Wide Web will become even more powerful if it is extended beyond an interconnected set of human readable documents to a set of machine readable documents that are able to capture and convey knowledge.

  • The field of semantics is concerned with the study of meaning, and ontology is essentially concerned with capturing and organizing knowledge. In computer science, an ontology is a system that attempts to capture and organize knowledge in a particular domain (in machine readable form), as understood by experts in that domain or subject area.

  • There are a variety of concepts that fall under the banner of "semantics and ontology" that are used to address specific issues in the development of "intelligent software". Some of these are:

  controlled vocabulary
  corpus
  crosswalk
  data dictionary
  lingua franca
  master dictionary
  nomenclature
  ontology
  preferred label
  standard names
  taxonomy
  typology
  unique identifier

  • While there are subtle differences between the items in this list, they can be divided into two broad groups. The terms controlled vocabulary, crosswalk, lingua franca, nomenclature, preferred label and standard names are all closely related and have the fairly simple, linear structure of a list or lookup table. They are used primarily to map a term used in one setting to an equivalent term in another setting. Relationships between entries in the list are not of primary interest. The main interest is knowing whether two terms refer to the same object.

  • By contrast, the terms ontology, master dictionary, taxonomy and typology represent efforts to capture relationships between entries (objects). They attempt to organize the objects into a hierarchy, which may include nested classes (sub- and super-classes). The connectedness or closeness of objects is also of interest. Because of this, they have the potential to capture knowledge, which is broadly concerned with relationships and the degree to which objects are similar. Their fundamental structure is that of a graph (nodes connected by lines) instead of a list.

  • These two broad groups of tools are used to address three main software use cases, namely

  Semantic mediation and matching
  Discovery of related information
  Capture and archiving of domain knowledge

  The last two of these require tools from the second, more complex group.

  • The vocabulary of a well-educated person contains on the order of 50,000 to 100,000 words. See this BBC article.


Purpose

  • The CSDMS semantic "use case" is one of automated semantic mediation, matching or reconciliation. While our focus is on a "lingua franca", our standard names are often built from a hierarchical set of concepts and may eventually be used to construct a type of ontology.

  • The CSDMS plug-and-play modeling system requires a set of standard names for input and output variables (quantities) in order to automatically determine whether an input variable in one model (or database) is equivalent to (or compatible with) an output variable in another model (or database) for the purpose of coupling the two resources (as user and provider). There is no need or requirement for these standard names to be used within a model, and they are too long to be used in this way. However, CSDMS requires model contributors to implement the BMI (Basic Model Interface), and this includes mapping each of the model's input and output variables to a CSDMS Standard Name. In addition, contributors provide a Model Metadata File (MMF) that (1) specifies how each standard name is used within the model (e.g. units, assumptions, etc.) and (2) describes other key attributes of the model that must be known to facilitate coupling to other models. See CSDMS Basic Model Interface for more information.

  • Our focus is more on identifying general rules and patterns for consistent construction of standard names (i.e. a systematic naming scheme) that span the geosciences and less on creating an exhaustive list of names, which comes later. We have identified numerous patterns and templates that cover a broad range of needs and these are listed and discussed in the subsequent sections of this document. This includes numerous Object Templates, Quantity Templates and Operation Templates.

  • RDF (Resource Description Framework) is built around an "object + attribute + value" concept. Our "object + quantity" names follow a similar pattern and are used to retrieve the values from a model or database . The word "attribute" is a more general term than "quantity"; the latter is essentially a type of attribute that can be described with numbers and has units.

  • Units are not given as part of the name, as with CF Standard Names. However, in CF Standard Names, a certain SI unit is often implied by the name. Also, the CF Standard Names allow inclusion of assumptions in the name, such as "_assuming_clear_sky". In CSDMS Standard Names, we use the name as a "key" or "index" to access not only the associated values but associated metadata that provides the units, set of assumptions, datum, how measured, etc. If all assumptions, etc. are included in the standard name, it limits the number of matches that are likely to be found during the discovery process or when trying to couple models. It also discourages a complete listing of the relevant assumptions. Metadata (including assumptions) can be used to distinguish between exact and approximate matches, and this information can be presented to users when desirable.

  • Guidelines for construction of CF Standard Names can be found at CF Standard Name Guidelines. The rules for CSDMS Standard Names being developed here are meant to be more general, more rigorously defined and less ambiguous. As of 5/3/12, there are 2134 CF Standard Names, but the number of distinct patterns reflected in this set is much, much smaller. Some of them already conform to the patterns and templates of the CSDMS Standard Names and these will be favored (or assimilated) whenever possible. However, CSDMS plans to provide a lookup table that maps each CF Convention Standard Name to a CSDMS Standard Name.