CSN Basic Rules

From CSDMS
Revision as of 14:59, 7 July 2015 by Peckhams (talk | contribs) (→‎  CSDMS Standard Names — Basic Rules)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

  CSDMS Standard Names — Basic Rules

  • This section provides some basic rules but many additional rules and naming patterns are given in other sections as explained below.
  • Every standard name has an object part that describes a particular object and a quantity part that describes a particular attribute of that object that can be quantified with a number. A large collection of examples can be viewed on the Examples page. Numerous templates, patterns and rules for constructing object names and quantity names are provided on the CSDMS Quantity Templates and CSDMS Object Templates pages. Quantity names are sometimes constructed using one of the CSDMS Process Names.
  • A standard name may have an optional operation prefix applied to its quantity name part that always ends with the reserved word "_of". This creates a new quantity from an existing quantity. See the CSDMS Operation Templates page for more information.
  • Standard names consist of lower-case letters and digits. They contain no blank spaces. As of February 2015, there are only 3 non-alphanumeric characters allowed in a standard name, underscores, hyphens and tildes. Each has a distinct purpose, as explained below.
  • A single underscore is used to delimit separate words in a standard name. In the object part of the name, underscores separate objects and sub-objects, and in most cases can be read as "has a" or "could have a".
  • A double underscore is used between the object part and the quantity part of the name. This serves as a unique delimiter between the object and quantity parts and also helps with alphabetization of objects and sub-objects.
  • Hyphens (as of July 23, 2014 ) are used in the following ways. (1) To indicate that the words in multi-word object name refer to a single object, as in "water_carbon-dioxide__solubility". This allows the object name to be parsed (on underscores) into multiple objects (often one being within or part of another). (2) To indicate that a set of words should be bundled into one concept or adjective, as in "channel_water__volume-per-length_flow_rate" or "air__mass-per-volume_density". Note that "per" is a reserved word.
  • Tildes (as of February 2015) are used to distinguish nouns and adjectives in the object (including sub-object) names that occur in the object part of a name. A new rule replaces any object name of the form "adjective-adjective-noun" with the form "noun~adjective~adjective". Object names must now begin with a noun and may be followed by any number of adjectives, separated by tilde characters. This rule leads to better alphabetization and logical ordering. For example, "incoming-longwave-radiation", "alaskan-black-bear" and "suspended-sediment" become "radiation~incoming~longwave", "bear~alaskan~black" and "sediment~suspended".
  • The ISO International System of Quantities (ISQ, ISO 80000) defines many different terms and phrases such as quantity, base quantity, derived quantity, quantity dimension and kind of quantity. For example, the eight fundamental base quantities are: length, mass, time, electric current, thermodynamic temperature, amount of substance, luminous intensity and currency.
  • The rightmost word (possibly hyphenated) in an object name is called the root object. This is the object to which the quantity applies, or on which the quantity is measured. Preceding object names typically indicate "container objects" and are used to establish context.
  • The rightmost word (possibly hyphenated) in a "quantity name" is called the root quantity. There are a limited number of root quantities (roughly 100-150), that are carefully chosen to unambiguously indicate the type of quantity. Additional words in the quantity name add meaning to identify a specific, unique quantity of that type. The phrase root quantity is used here to avoid conflict with base quantity, as defined by ISO 80000. Also in keeping with ISO 80000, many root quantities are the same kind of quantity, despite having different definitions. Examples include: length, width, distance, radius, diameter, perimeter, amplitude, wavelength (of the kind "length); height, depth, thickness, elevation, altitude, level (of the kind "height"); fee, price, income (of the kind "currency"); duration, age, period, time (of the kind "time"); force, weight (of the kind "force"); pressure, stress (of the kind "pressure"); angle, latitude, longitude (of the kind "angle"). Other root quantities with specific meanings are: speed, slope, mass, temperature, energy, power, count (for non-negative integer quantities), index (for statistical "measures"), constant (for mathematical and physical constants), number (for dimensionless numbers), coefficient (for multiplicative factors), exponent, parameter, capacity, mole, density, frequency, wavenumber, charge, voltage, current, conductivity, resistance, albedo, reflectance, absorptance, transmittance, viscosity, vorticity, area and volume. Note that the terms fraction, ratio, rate (per unit time) and flux (per unit area and time) may be used in the construction of root quantity names, as in "volume-flux" and "volume-flow-rate".
  • Note: "Quantity suffixes" have been deprecated almost completely, but "time_step" is an exception. If the rightmost word in a quantity name is a quantity suffix (e.g. step) then the last two words are the root quantity (e.g. time_step). See the CSDMS Quantity Templates for an explanation of "quantity suffix".
  • Many CSDMS Standard Names contain a person's last name. If the last name ends with the letter "s" — as in Burgers, Gibbs, Huygens, Jones, Potts, Reynolds, Shields and Stokes — then it is retained. However, a possessive "s" is never added to the name, so we would use "newton" vs. "newtons" in a standard name.
  • Acronyms and abbreviations are sometimes used in standard names, but are generally avoided for clarity. Note that "leq" is currently used as an abbreviation for "liquid-equivalent" and "x-section" is used instead of "cross-section". Standard symbols for the chemical elements (but lower-case, like "h" and "c") can be used in naming quantities like "bond_angle" that involve multiple atoms in a molecule. See Attributes of Molecules on the CSDMS Quantity Templates page.
  • Numbers may be used as part of an object name or in adjectives. Examples include "cesium~133" and "air_light~550-nm-wavelength__refraction_index". In the second example, "550-nm-wavelength" would be preferable to "yellow".
  • As explained at the top of the CSDMS Process Names page, the "ing" ending on process names such as "shearing" and "melting" is often dropped for quantities like "shear_stress" and "melt_rate" that use the Process_name + Quantity Pattern. However, the "ing" ending may be retained when the same word is used in a quantity like "melting_point_temperature" (vs. "melt_temperature").
  • Word order in object names. Starting with a base object, adjectives are added to the right after a tilde character (as of February 2015) in an effort to construct an unambiguous and easily understood object name. The addition of each adjective produces a more restrictive or specific name from the previous object name. For example:
bear
bear~black
bear~black~alaskan
 
spider
spider~black-widow
However, object names may contain either a single object name or multiple object names. In the Part of Another Object Pattern, there is generally some sort of "containment" and the separate object names (with their adjectives, separated by tildes) are ordered from the general to the specific (or superset to subset), left to right.
In addition, some quantities — like concentration, partial pressure and solubility — require specifying multiple objects. The last two object names in the object part should be the two (or more) required objects in such cases. Each of these quantities has a template that explains how words are ordered. For example, the "kinetic_friction_coefficient" associated with two objects that are in contact (e.g. rubber and pavement) doesn't imply an ordering, so the ordering is alphabetical in order to avoid multiple names for the same thing.
  • Alphabetization. It is easier to find standard names that refer to the same object if there is some alphabetical ordering. The left to right "containment" rule in the object part supports this, as does the new rule (see above) that uses the tilde character to add adjectives after nouns. The leftmost object name often refers to a domain or medium such as atmosphere, land, lithosphere, sea or soil.
  • Parsability. While standard variable names are used primarily for semantic matching, which does not require any parsing, CSDMS recognizes the many advantages of being able to automatically parse a standard name (e.g. with a small Python program) and deconstruct it into its various parts. One advantage is that it will then be easier to map the names to other formats or lists of names or to build an ontology from them. Another advantage is that a "smart framework" can then use subsets of names (typically by removing words from the left-hand side) to find potentially valid but inexact matches and present them to users. All of the CSDMS name construction rules attempt to honor this parsability. This is sometimes achieved through the use of special delimiters or reserved words like "__" and "_of_" or through the ability to distinguish nouns (sub-objects in an object name) from the adjectives that act on them. These same rules allow the names to be parsed visually by the people who use them. For example, the word "of" is used as a verbal delimiter in spoken math.
  • Word order in quantity names. Starting with a base quantity (which could end with a quantity suffix), adjectives are added to the left in an effort to construct an unambiguous and easily understood quantity name. The addition of each new word (or words) produces a more restrictive or specific name from the previous name. For example:
conductivity
hydraulic_conductivity
saturated_hydraulic_conductivity   (which uses the "Saturated Quantity Rule)
effective_saturated_hydraulic_conductivity
The order in which adjectives/modifiers are added to the left may not always be clear, but in this example "hydraulic_conductivity" and "saturated_hydraulic_conductivity" are two fundamental quantities that would be used in a groundwater model and "effective" could be applied to either of them to indicate application at a given scale. Note also that "saturated" could have been applied to "soil", the associated object, but in models "saturated_hydraulic_conductivity" is a fundamental quantity. In addition, names starting with "saturated_soil" would be alphabetically separated from those starting with "soil".
  • Remove Objects from Quantity Names Rule. There are many quantity names in common use that include an object in the name, such as "water_content" or "liquid_water_equivalent". In such cases a standard name is constructed so that the named object is moved into the object part of the name. This has many advantages, one of which is that it allows a commonly used quantity concept to be used more generally. For example, "liquid_equivalent_precipitation" (without the word "water") is a quantity name that can be used for water in Earth's atmosphere or for methane in Titan's atmosphere. Similarly, the quantity name "relative saturation" is general and makes no reference to a particular substance/object, while "relative humidity" is only valid for water, even though it doesn't include the word water explicitly.
  • Object vs. Adjective Rule. There are many cases where an adjective refers directly to a specific object. Examples include:
atmospheric, atmosphere:  mars_atmosphere_thickness
axial, axis:              earth_axis__tilt_angle
basal, base:              glacier_bottom__shear_stress
orbital, orbit:           earth_orbit__eccentricity
refractive, refraction:   air_light~550-nm-wavelength__refraction_index
sectional, section:       channel_x-section__area
solar, sun:               earth-to-sun_line__distance    (vs. earth_to_sun_distance)
Instead of using this type of adjective in a quantity name, the corresponding object name is used (as in the examples above), usually within the Part of Another Object Pattern. This will sometimes result in an instance of the Process_name + Quantity Pattern since process names are nouns/objects. (As in "air_light~550-nm-wavelength__refraction_index" above.)
  • State of Matter Rule. For some standard names it is important to clarify the relevant (or assumed) state of matter. See: State of matter. In such cases, placing an adjective like "gas" or "liquid" before the object name (e.g. "liquid_nitrogen") would disrupt alphabetical grouping. To preserve alphabetical grouping, words like "vapor", "liquid", "ice" or "solid" are used with a preceding tilde, as in: "carbon-dioxide~gas", carbon-dioxide~ice", "nitrogen~liquid", "water~vapor" and "water~liquid". For quantities that do not depend on the state/phase of matter, like "temperature", this extra word to indicate the phase is not needed. Whenever the words "gas", "vapor", "liquid", "ice" and "solid" are preceded by a tilde, they are interpreted as indicating the phase of the substance before the tilde.
  • Patterns and rules for constructing the object name part of a CSDMS Standard Name are provided at the top of the CSDMS Object Templates page.