Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

NZOR Data Integration

Workflow

ImporterProviderNZORHarvester
Multi-threaded, grid computing environment
Submitted CSV dataset Integration Dataset
Results merged back into core NZOR dataMapping to Integration Dataset Format
Image Removed
Matching / Integrator
Results returned to submitterIntegration Result Dataset
 

Workflow

Image Added 

UML Diagram

Image Modified










NZOR Integration Fields


2 types of matching – simple and structured. Simple is when only the Name text is provided and possibly other fields and Structured is when a set of detailed fields are provided.
For Simple the fields RequiredForMatching need to be derived/parsed/calculated from the fields that are provided.
Simple:

...


The highest ranked taxon name in a provider DataSet must either attach to a Kingdom name or a defined attachment point for that provider dataset.
Eg, a provider dataset may be the names for a particular family, say Compositae. An attachment point needs to be defined for this name (Compositae). This attachment point name MUST then be provided in the dataset so that it is possible to determine how to attach all subordinate names.
It is possible to have multiple attachment points, for different parts of the taxonomic classification hierarchy. There will need to be a Default attachment point defined for default actions such as placing a name that has unknown parentage.

Matching


Diagram:
Image Modified





Matching Components:
Parent
The parent of a name defines where this name fits within a scientific classification, eg the Genus where a particular species is placed. (e.g. "Poa anceps" = the genus Poa is the parent name for the species "anceps"). This is a valuable property for the matching process.
Some types of names do not have a classification, e.g. Vernacular (common) names.

...

  1. Insert/update all References (as names and concepts rely on these)
  2. Update Provider References to point to the relevant Consensus Reference
  3. Refresh Consensus Reference data from all Provider records for modified references
  4. Insert update all Names (concepts rely on these)
  5. Update Provider Names to point to the relevant Consensus Name
  6. Refresh Consensus Name data from all Provider records for modified Names
  7. Insert/update all Concepts (not relationships as the relationships rely on both Concepts to be in existence)
  8. Update Provider Concepts to point to the relevant Consensus Concept
  9. Refresh Consensus Concept data from all Provider records for modified Concpets
  10. Insert/update all ConceptRelationships for each modified Concept, from all provider ConceptRelationship records

 

Technical Platform


Approaches:

...