Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The NZOR system has a number of linked components:

  • A community of Data Providers who manage taxonomic data in their own database systems

  • A data harvesting system managed by the NZOR system for regularly collecting data from data providers

  • An integration engine which creates a single dataset from all data provider records and reconciles data which may overlap in content

  • A data publishing system which transforms the single dataset into an optimised form for use by end users, which is exposed over the internet as a set of web-services

  • An administration module for managing data provider content, data validation processes and a number of other activities

  • A community of data consumers who use the NZOR web services to support local database systems, generally through the maintenance of a local cache of NZOR data

...


...

Harvesting

The NZOR system harvests taxonomic data from multiple data providers. Each data provider maintains a taxonomic database which may be in a variety of forms, data structures, and software platforms. Data elements within their system are mapped to a common data standard (the NZOR data provider schema). The NZOR Provider Schema is based on the TDWG Taxon Concept Schema Standard (TCS). The standard data elements are then made available for automatic harvesting over the internet using a generic harvesting protocol called the Open Archives Initiative for Metadata Harvesting (OAI-PMH). The combination of the mapping/harvesting interface is called a ‘wrapper’.

...

The integration process may be summarised:

  1. Discover reconciliation groups of same names from all providers (equivalent name strings taking into account spelling variations)

  2. Create/update simple majority consensus nomenclatural records linked to groups and create a persistent NZOR name GUID

  3. Discover equivalent published concepts delivered by multiple providers

  4. Create/update an NZOR concept records linked to a persistent NZOR concept GUID

  5. Create simple majority concept relationship records for the preferred name and the parent name

  6. Create/update NZOR single consensus taxonomic view from endorsed provider concepts

  7. Break any deadlocks by following endorsed concepts from preferred provider for a defined taxonomic group

  8. Track changes to NZOR taxonomy over time

Publication

The single NZOR dataset of integrated records requires transformation into a form that is optimised and indexed (Lucene) for querying. In addition Taxa Match algorithms are employed to parse and optimise the searching of organism names. End-users may submit queries on the dataset through a set of standard web-services. NZOR is designed to provide web-services in a RESTful format and SOAP. The result of a query conforms to the NZOR consumer schema and may be represented in a number of forms, e.g. XML, JSON.

...

NZOR provides data which conforms to standard form, the NZOR consumer schema. An NZOR dataset contains information on

    1. metadata on the data providers

    2. publications relevant to taxonomic names

    3. scientific names and their nomenclatural details, and vernacular (common) names

    4. taxonomic concepts, by which we mean the use of a name in a publication and the relationships asserted within publications about taxa, that one is a parent taxon of another, or is a synonym of another.

    5. properties of a taxon, in particular the biostatus, by which we mean information on the presence/absence in a defined geographical region.