CSN Basic Rules: Difference between revisions
From CSDMS
						
						| Line 26: | Line 26: | ||
| * '''''Acronyms''''' and '''''abbreviations''''' are sometimes included in standard names, but they are usually spelled out explicitly.  Note that "leq" is currently used as an abbreviation for "liquid-equivalent" and "x-section" is used instead of "cross-section".  Standard symbols for the chemical elements (but lower-case, like "h" and "c") can be used in naming quantities like "bond_angle" that involve multiple atoms in a molecule.  See Attributes of Molecules on the [[CSN_Quantity_Templates | '''CSDMS Quantity Templates''']] page. | * '''''Acronyms''''' and '''''abbreviations''''' are sometimes included in standard names, but they are usually spelled out explicitly.  Note that "leq" is currently used as an abbreviation for "liquid-equivalent" and "x-section" is used instead of "cross-section".  Standard symbols for the chemical elements (but lower-case, like "h" and "c") can be used in naming quantities like "bond_angle" that involve multiple atoms in a molecule.  See Attributes of Molecules on the [[CSN_Quantity_Templates | '''CSDMS Quantity Templates''']] page. | ||
| : | : | ||
| * Numbers may be used as part of an object name or in adjectives.  Examples include "cesium | * Numbers may be used as part of an object name or in adjectives.  Examples include "cesium~133" and "air_light~550-nm-wavelength__refraction_index". In the second example, "550-nm-wavelength" would be preferable to "yellow". | ||
| : | : | ||
| * As explained at the top of the [[CSN_Process_Names | '''CSDMS Process Names''']] page, the "ing" ending on process names such as "shearing" and "melting" is often dropped for quantities like "shear_stress" and "melt_rate" that use the '''Process_name + Quantity Pattern'''.  However, the "ing" ending may be retained when the same word is used in a quantity like "melting_point_temperature" (vs. "melt_temperature"). | * As explained at the top of the [[CSN_Process_Names | '''CSDMS Process Names''']] page, the "ing" ending on process names such as "shearing" and "melting" is often dropped for quantities like "shear_stress" and "melt_rate" that use the '''Process_name + Quantity Pattern'''.  However, the "ing" ending may be retained when the same word is used in a quantity like "melting_point_temperature" (vs. "melt_temperature"). | ||
Revision as of 16:30, 15 February 2015
CSDMS Standard Names — Basic Rules
- This section provides some basic rules but many additional rules and naming patterns are given in other sections as explained below.
- Every standard name has an object part that describes a particular object and a quantity part that describes a particular attribute of that object that can be quantified with a number. A large collection of examples can be viewed on the Examples page. Numerous templates, patterns and rules for constructing object names and quantity names are provided on the CSDMS Quantity Templates and CSDMS Object Templates pages. Quantity names are sometimes constructed using one of the CSDMS Process Names.
- A standard name may have an optional operation prefix applied to its quantity name part that always ends with the reserved word "_of". This creates a new quantity from an existing quantity. See the CSDMS Operation Templates page for more information.
- Standard names consist of lower-case letters and digits. They contain no blank spaces. As of February 2015, there are only 3 non-alphanumeric characters allowed in a standard name, underscores, hyphens and tildes. Each has a distinct purpose, as explained below.
- A single underscore is used to delimit separate words in a standard name. In the object part of the name, underscores separate objects and sub-objects, and in most cases can be read as "has a" or "could have a".
- A double underscore is used between the object part and the quantity part of the name. This serves as a unique delimiter between the object and quantity parts and also helps with alphabetization of objects and sub-objects.
- Hyphens (as of July 23, 2014 ) are used in the following ways. (1) To indicate that the words in multi-word object name refer to a single object, as in "water_carbon-dioxide__solubility". This allows the object name to be parsed (on underscores) into multiple objects (often one being within or part of another). (2) To indicate that a set of words should be bundled into one concept or adjective, as in "channel_water__volume-per-length_flow_rate" or "air__mass-per-volume_density". Note that "per" is a reserved word.
- Tildes (as of February 2015) are used to distinguish nouns and adjectives in the object (including sub-object) names that occur in the object part of a name. A new rule replaces any object name of the form "adjective-adjective-noun" with the form "noun~adjective~adjective". Object names must now begin with a noun and may be followed by any number of adjectives, separated by tilde characters. This rule leads to better alphabetization and logical ordering. For example, "incoming-longwave-radiation", "alaskan-black-bear" and "suspended-sediment" become "radiation~incoming~longwave", "bear~alaskan~black" and "sediment~suspended".
- The rightmost word in an object name is called the base object to which the quantity applies. Similarly, the rightmost word (in most cases) in a "quantity name" is called the base quantity. Note: "Quantity suffixes" have mostly been deprecated, but "time_step" is an exception. If the rightmost word in a quantity name is a quantity suffix (e.g. step) then the last two words are the base quantity (e.g. time_step). See the CSDMS Quantity Templates for an explanation of "quantity suffix".
- There are several short reserved words such as as, at, in, of, on (or and?), or, per, to and vs. These are used within patterns that deal with various issues as described in the CSDMS Object Templates, CSDMS Quantity Templates and CSDMS Operation Templates. The words reference and standard may also be reserved. See the Reference Quantities template.
- Many CSDMS Standard Names contain a person's last name. If the last name ends with the letter "s" — as in Burgers, Gibbs, Huygens, Jones, Potts, Reynolds, Shields and Stokes — then it is retained. However, a possessive "s" is never added to the name, so we would use "newton" vs. "newtons" in a standard name.
- Acronyms and abbreviations are sometimes included in standard names, but they are usually spelled out explicitly. Note that "leq" is currently used as an abbreviation for "liquid-equivalent" and "x-section" is used instead of "cross-section". Standard symbols for the chemical elements (but lower-case, like "h" and "c") can be used in naming quantities like "bond_angle" that involve multiple atoms in a molecule. See Attributes of Molecules on the CSDMS Quantity Templates page.
- Numbers may be used as part of an object name or in adjectives. Examples include "cesium~133" and "air_light~550-nm-wavelength__refraction_index". In the second example, "550-nm-wavelength" would be preferable to "yellow".
- As explained at the top of the CSDMS Process Names page, the "ing" ending on process names such as "shearing" and "melting" is often dropped for quantities like "shear_stress" and "melt_rate" that use the Process_name + Quantity Pattern. However, the "ing" ending may be retained when the same word is used in a quantity like "melting_point_temperature" (vs. "melt_temperature").
- Word order in object names. Starting with a base object, adjectives are added to the right after a tilde character (as of February 2015) in an effort to construct an unambiguous and easily understood object name. The addition of each adjective produces a more restrictive or specific name from the previous object name. For example:
bear bear~black bear~black~alaskan spider spider~black-widow
- However, object names may contain either a single object name or multiple object names. In the Part of Another Object Pattern, there is generally some sort of "containment" and the separate object names (with their adjectives, separated by tildes) are ordered from the general to the specific (or superset to subset), left to right.
- In addition, some quantities — like concentration, partial pressure and solubility — require specifying multiple objects. The last two object names in the object part should be the two (or more) required objects in such cases. Each of these quantities has a template that explains how words are ordered. For example, the "kinetic_friction_coefficient" associated with two objects that are in contact (e.g. rubber and pavement) doesn't imply an ordering, so the ordering is alphabetical in order to avoid multiple names for the same thing.
- Alphabetization. It is easier to find standard names that refer to the same object if there is some alphabetical ordering. The left to right "containment" rule in the object part supports this, as does the new rule (see above) that uses the tilde character to add adjectives after nouns. The leftmost object name often refers to a domain or medium such as atmosphere, land, lithosphere, sea or soil.
- Parsability. While standard variable names are used primarily for semantic matching, which does not require any parsing, CSDMS recognizes the many advantages of being able to automatically parse a standard name (e.g. with a small Python program) and deconstruct it into its various parts. One advantage is that it will then be easier to map the names to other formats or lists of names or to build an ontology from them. Another advantage is that a "smart framework" can then use subsets of names (typically by removing words from the left-hand side) to find potentially valid but inexact matches and present them to users. All of the CSDMS name construction rules attempt to honor this parsability. This is sometimes achieved through the use of special delimiters or reserved words like "__" and "_of_" or through the ability to distinguish nouns (sub-objects in an object name) from the adjectives that act on them. These same rules allow the names to be parsed visually by the people who use them. For example, the word "of" is used as a verbal delimiter in spoken math.
- Word order in quantity names. Starting with a base quantity (which could end with a quantity suffix), adjectives are added to the left in an effort to construct an unambiguous and easily understood quantity name. The addition of each new word (or words) produces a more restrictive or specific name from the previous name. For example:
conductivity hydraulic_conductivity saturated_hydraulic_conductivity (which uses the "Saturated Quantity Rule) effective_saturated_hydraulic_conductivity
- The order in which adjectives/modifiers are added to the left may not always be clear, but in this example "hydraulic_conductivity" and "saturated_hydraulic_conductivity" are two fundamental quantities that would be used in a groundwater model and "effective" could be applied to either of them to indicate application at a given scale. Note also that "saturated" could have been applied to "soil", the associated object, but in models "saturated_hydraulic_conductivity" is a fundamental quantity. In addition, names starting with "saturated_soil" would be alphabetically separated from those starting with "soil".
- Remove Objects from Quantity Names Rule. There are many quantity names in common use that include an object in the name, such as "water_content" or "liquid_water_equivalent". In such cases a standard name is constructed so that the named object is moved into the object part of the name. This has many advantages, one of which is that it allows a commonly used quantity concept to be used more generally. For example, "liquid_equivalent_precipitation" (without the word "water") is a quantity name that can be used for water in Earth's atmosphere or for methane in Titan's atmosphere. Similarly, the quantity name "relative saturation" is general and makes no reference to a particular substance/object, while "relative humidity" is only valid for water, even though it doesn't include the word water explicitly.
- Object vs. Adjective Rule. There are many cases where an adjective refers directly to a specific object. Examples include:
atmospheric, atmosphere: mars_atmosphere_thickness axial, axis: earth_axis__tilt_angle basal, base: glacier_bottom__shear_stress orbital, orbit: earth_orbit__eccentricity refractive, refraction: air_light~550-nm-wavelength__refraction_index sectional, section: channel_x-section__area solar, sun: earth-to-sun_line__distance (vs. earth_to_sun_distance)
- Instead of using this type of adjective in a quantity name, the corresponding object name is used (as in the examples above), usually within the Part of Another Object Pattern. This will sometimes result in an instance of the Process_name + Quantity Pattern since process names are nouns/objects. (As in "air_light~550-nm-wavelength__refraction_index" above.)
- State of Matter Rule. For some standard names it is important to clarify the relevant (or assumed) state of matter. See: State of matter. In such cases, placing an adjective like "gas" or "liquid" before the object name (e.g. "liquid_nitrogen") would disrupt alphabetical grouping. To preserve alphabetical grouping, words like "vapor", "liquid", "ice" or "solid" are used with a preceding tilde, as in: "carbon-dioxide~gas", carbon-dioxide~ice", "nitrogen~liquid", "water~vapor" and "water~liquid". For quantities that do not depend on the state/phase of matter, like "temperature", this extra word to indicate the phase is not needed. Whenever the words "gas", "vapor", "liquid", "ice" and "solid" are preceded by a hyphen, they are interpreted as indicating the phase of the substance before the hyphen.
- Patterns and rules for constructing the quantity name part of a CSDMS Standard Name are provided at the top of the CSDMS Quantity Templates page. Also see the CSDMS Process Names and CSDMS Operation Templates pages.
- Patterns and rules for constructing the object name part of a CSDMS Standard Name are provided at the top of the CSDMS Object Templates page.
