CSN Overview: Difference between revisions
From CSDMS
						
						| Line 55: | Line 55: | ||
| * Each standard name must have an '''object''' part and a '''quantity''' part, with adjectives and modifiers (as prefixes) being used to help avoid ambiguity and identify a specific object and associated quantity. It may also have an optional '''operation prefix'''. <br/> <br/> | * Each standard name must have an '''object''' part and a '''quantity''' part, with adjectives and modifiers (as prefixes) being used to help avoid ambiguity and identify a specific object and associated quantity. It may also have an optional '''operation prefix'''. <br/> <br/> | ||
| * RDF (Resource Description Framework) is built around an "object + attribute + value" concept.  Our "object + quantity" names follow a similar pattern and are used to retrieve values from a model or database .  The word "attribute"  | * RDF (Resource Description Framework) is built around an "object + attribute + value" concept.  Our "object + quantity" names follow a similar pattern and are used to retrieve values from a model or database .  The word "attribute" is a more general term than "quantity";  the latter is essentially a type of attribute that can be described with numbers and has units. <br/> <br/> | ||
| * Units are not given as part of the name, as with CF Standard Names. However, in CF Standard Names, a certain SI unit is often implied by the name.  Also, the CF Standard Names allow inclusion of assumptions in the name, such as "_assuming_clear_sky".  In CSDMS Standard Names, we use the name as a "key" or "index" to access not only the associated values but associated metadata that provides the units, set of assumptions, datum, how measured, etc.  If all assumptions, etc. are included in the standard name, it limits the number of matches that are likely to be found during the discovery process or when trying to couple models.  It also discourages a complete listing of the relevant assumptions. Metadata (including assumptions) can be used to distinguish between exact and approximate matches, and this information can be presented to users when desirable. <br/> <br/> | * Units are not given as part of the name, as with CF Standard Names. However, in CF Standard Names, a certain SI unit is often implied by the name.  Also, the CF Standard Names allow inclusion of assumptions in the name, such as "_assuming_clear_sky".  In CSDMS Standard Names, we use the name as a "key" or "index" to access not only the associated values but associated metadata that provides the units, set of assumptions, datum, how measured, etc.  If all assumptions, etc. are included in the standard name, it limits the number of matches that are likely to be found during the discovery process or when trying to couple models.  It also discourages a complete listing of the relevant assumptions. Metadata (including assumptions) can be used to distinguish between exact and approximate matches, and this information can be presented to users when desirable. <br/> <br/> | ||
Revision as of 10:29, 15 August 2012
CSDMS Standard Names — Overview
Background
- The Semantic Web concept and movement recognizes that the already transformative World Wide Web will become even more powerful if it is extended beyond an interconnected set of human readable documents to a set of machine readable documents that are able to capture and convey knowledge.  
 
- The field of semantics is concerned with the study of meaning, and ontology is essentially concerned with capturing and organizing knowledge.  In computer science, an ontology is a system that attempts to capture and organize knowledge in a particular domain (in machine readable form), as understood by experts in that domain or subject area. 
 
- There are a variety of concepts that fall under the banner of "semantics and ontology" that are used to address specific issues in the development of "intelligent software". Some of these are:
  controlled vocabulary 
  corpus 
  crosswalk 
  data dictionary 
  lingua franca 
  master dictionary 
  nomenclature 
  ontology 
  preferred label 
  standard names 
  taxonomy 
  typology 
  unique identifier 
- While there are subtle differences between the items in this list, they can be divided into two broad groups.  The terms controlled vocabulary, crosswalk, lingua franca, nomenclature, preferred label and standard names are all closely related and have the fairly simple, linear structure of a list or lookup table. They are used primarily to map a term used in one setting to an equivalent term in another setting. Relationships between entries in the list are not of primary interest.  The main interest is knowing whether two terms refer to the same object.  
 
- By contrast, the terms ontology, master dictionary, taxonomy and typology represent efforts to capture relationships between entries (objects).  They attempt to organize the objects into a hierarchy, which may include nested classes (sub- and super-classes). The connectedness or closeness of objects is also of interest.  Because of this, they have the potential to capture knowledge, which is broadly concerned with relationships and the degree to which objects are similar.  Their fundamental structure is that of a graph (nodes connected by lines) instead of a list.  
 
- These two broad groups of tools are used to address three main software use cases, namely
  Semantic mediation and matching 
  Discovery of related information 
  Capture and archiving of domain knowledge 
  The last two of these require tools from the second, more complex group.
Purpose
- The CSDMS semantic "use case" is one of automated semantic mediation, matching or reconciliation. While our focus is on a "lingua franca", our standard names are often built from a hierarchical set of concepts and may eventually be used to construct a type of ontology. 
 
- The CSDMS plug-and-play modeling system requires a set of standard names for input and output variables (quantities) in order to automatically determine whether an input variable in one model (or database) is equivalent to (or compatible with) an output variable in another model (or database) for the purpose of coupling the two resources (as user and provider). There is no need or requirement for these standard names to be used within a model, and they are too long to be used in this way.  However, CSDMS requires model contributors to implement the BMI (Basic Model Interface), and this includes mapping each of the model's input and output variables to a CSDMS Standard Name.  In addition, contributors provide a Model Metadata File (MMF) that (1) specifies how each standard name is used within the model (e.g. units, how measured, etc.) and (2) describes other key attributes of the model that must be known to facilitate coupling to other models.  See CSDMS BMI for more information. #######  
 
- Our focus is more on identifying general rules and patterns for consistent construction of standard names (i.e. a systematic naming scheme) that span the geosciences and less on creating an exhaustive list of names, which comes later. We have identified numerous patterns and templates that cover a broad range of needs and these are listed and discussed in the subsequent sections of this document.  This includes numerous  Object Templates,   Quantity Templates and  Operation Templates. 
 
- Each standard name must have an object part and a quantity part, with adjectives and modifiers (as prefixes) being used to help avoid ambiguity and identify a specific object and associated quantity. It may also have an optional operation prefix. 
 
- RDF (Resource Description Framework) is built around an "object + attribute + value" concept.  Our "object + quantity" names follow a similar pattern and are used to retrieve values from a model or database .  The word "attribute" is a more general term than "quantity";  the latter is essentially a type of attribute that can be described with numbers and has units. 
 
- Units are not given as part of the name, as with CF Standard Names. However, in CF Standard Names, a certain SI unit is often implied by the name.  Also, the CF Standard Names allow inclusion of assumptions in the name, such as "_assuming_clear_sky".  In CSDMS Standard Names, we use the name as a "key" or "index" to access not only the associated values but associated metadata that provides the units, set of assumptions, datum, how measured, etc.  If all assumptions, etc. are included in the standard name, it limits the number of matches that are likely to be found during the discovery process or when trying to couple models.  It also discourages a complete listing of the relevant assumptions. Metadata (including assumptions) can be used to distinguish between exact and approximate matches, and this information can be presented to users when desirable. 
 
- Guidelines for construction of CF Standard Names can be found at: CF Standard Name Guidelines.  The rules for CSDMS Standard Names being developed here are meant to be both more general, more rigorously defined and less ambiguous. As of 5/3/12, there are 2134 CF Standard Names, and these will be favored (or assimilated) whenever they conform to the patterns and templates of the CSDMS Standard Names. 
 
- There are several short reserved words such as "of", "in" and "to" that play specific roles to deal with issues that are described in  CSDMS Object Templates,  CSDMS Quantity Templates and  CSDMS Operation Templates. 
 
- The CF Standard Names promote the importance of specifying a surface. We could use an "object-at-location_object" pattern with "_at_" as another reserved word for this purpose. Possible examples are:
air_at_toa_temperature channel_water_at_bed_pressure channel_water_at_bed_temperature earth_atmosphere_at_land_surface_pressure water_at_land_surface_infiltration_rate The CF Standard Names use the abbreviation "toa" for top of atmosphere.
