...
Some names have different nomenclatural status (e.g. whether the name was validly published under the code – if not the status = nom. inval.). It is possible for that name to be subsequently validly published. The details of the name will be the same, but the name now has a different nomenclatural status. At first it seems to be best to treat these as separate names that are linked. However this may cause issues during integration, for example if a provider does not provide the nomenclatural status, is this data for the validly published name or the invalid one? It was decided therefore to treat these names as the same and pool all nomenclatural status values for that name – it is then up to the consumer / viewer of that name to determine the status of the name. Therefore these names instances result in the same name, but there was two nomenclatural acts that led to this name.
Anchor | ||||
---|---|---|---|---|
|
Integration By SQL
A simpler, but much faster approach to building an initial set of integrated names is to use SQL queries.
The idea is based on the fact that most names are distinct (about 98%). It is therefore much more efficient to generate a "backbone" of names from these distinct names, rather than iterating through them all, performing a mathc to discover there are no matches, then inserting the name as a new consensus name.
This approach works with the most complete names first, with the theory that a name with less detail will match multiple names with more detail.
The fields that are used for defining a distinct name are:
- Canonical
- Rank
- Authors
- Year
- Genus
- Species
- GoverningCode
Genus and Species are not fields of a name, but are calculated fields based on parent concepts.The theory with including these fields is to ensure and sub generic name or sub specific name matches other sub generic/specific names that do not have exactly the same parent hierarchy.
For example:
Name 1: Aus bus var. cus
- Aus, genus
- bus , species
- cus, variety
- bus , species
Name 2: Aus bus xus var. cus
- Aus, genus
- bus, species
- xus, subspecies
- cus, variety
- xus, subspecies
- bus, species
The fields for these 2 names will be:
Name | Canonical | Rank | Authors | Year | Genus | Species | Governing Code |
---|---|---|---|---|---|---|---|
1 | cus | var. | Aus | bus | ICBN | ||
2 | cus | var. | Aus | bus | ICBN |
So according to these fields, the names will match even though the direct parents of the 2 'cus' names are different, which is correct.
Another example:
Name 1: Lecanorales Nannf., order
- Ascomycetes, class
- Lecanorales Nannf., order
Name 2: Lecanorales, order
- Ascomycetes, class
- Lecanoromycetidae, subclass
- Lecanorales, order
- Lecanoromycetidae, subclass
Name | Canonical | Rank | Authors | Year | Genus | Species | Governing Code |
---|---|---|---|---|---|---|---|
1 | Lecanorales | order | Nannf. | ICBN | |||
2 | Lecanorales | order | ICBN |
Again, will match even though the parent names are definied to be different. Again this is correct.
Generating Consensus Records
...