Interoperability of Specimen Data

In cases where a collection publishes a snapshot of their specimen records that are stored and managed within the home institutions (snapshot data set), specimen records are periodically refreshed under the direction of the collection managers. Even if the collection is managed directly within the portal (live data set), importing data files is a common task. A large diversity of methods and tools are used to store and manage specimen data. This creates a challenge in establishing effective methods and protocols that maintain up-to-date representations of the data sets within the portal. To address this challenge, several methods are employed to ensure maximum interoperability of data. Each method below tends to have its own strength and weakness, with no one procedure appropriate for all collection situations. While all methods employ different transfer protocols, they all follow the same framework and workflow: 1) Data is loaded into a temporary specimen table within the portal database; 2) Standard cleaning and data verification scripts are triggered; 3) Custom stored procedure performs cleaning and data transforming tasks specifically tailored for that specific collection; 4) Records are activated by transferring them to the central occurrence table; and 5) Collection statistics are updated.

A password protected user interface gives collection managers the ability to update their data on demand. Setup of this service is usually performed by a portal administrator who has the knowledge needed to establish an initial field mapping and write the custom stored procedure to clean the incoming records. In cases where the source and Symbiota database differ in character set definitions, import methods have the ability to translate between UTF-8 and ISO-8859 character sets. For detailed procedures on establishing upload protocols for a specific collection, go to the Specimen Upload Procedure page.

  • CSV Upload – Collection data manager has the ability to perform on-demand uploads of flat text data files that are extracted from source database. Setup procedure involves mapping source fields to the Symbiota fields. While the source column names do not have to match Symbiota fields, data type and definitions need to comply with the Darwin Core Standard.
    • Most common upload method simply because this tends to be easiest and most supported method to extract data from a local system
    • Server configurations limit the size of file that can be uploaded. The default PHP upload limit is typically set at 2MB, which can be increased by server administrator. Compressing the data file in a zip file is recommended since it will significantly reduce the upload file size. Data can also be broken up into several files that are uploaded separately.
    • Particularly works well for refreshing datasets by uploading a subset of the most recently modified and new records
  • Darwin Core Archive (IPT) – One of the more widely endorsed and preferred protocols for data exchange of natural history collections. Supports the ingestion of determination history and images when those extensions are included.
    • Preferred method to use when an IPT provider is available
    • Darwin Core compliant fields are automatically mapped as the default. The mapping can be adjusted and saved as needed.
    • If a URL is given to the source DwC-Archive file, data will be streamed into the system. Web service is available for automated imports.
    • DwC-Archive data packets are ingested manually (e.g. select file on local computer), server upload restriction can limit the size of the data file that can be uploaded.
  • Direct Transfer – Database-to-database transfer from source MySQL or MariaDB database to Symbiota database. Setup procedure involves setting up access variable and mapping source fields to the Symbiota fields. 
    • Symbiota database must have read access through firewalls to source database
  • DiGIR Provider – an older data exchange protocol that is not well supported any more, but still offered as an option
    • DiGIR provider must be established and host source collection with fields correctly mapped to the Darwin Core standard
    • Slow transfer method than the DwC-archive methods
  • Alternative Methods
    • SQL Stored Procedure – transfer from source schema to Symbiota database located on the same MySQL database server. Fastest transfer method that can be automated.
    • System Script – MySQL source to Symbiota database that is located on a different server. Fastest transfer method that can be automated. Firewall restrictions are commonly an issue.
  • Live Data Management – Specimen record data is managed directly within Symbiota data portal – source and portal data are the same
    • No need to perform regular data updates. Portal always has the most updated representation of record data
    • Particularly well suited for collections with little or no technical support