Loading Specimen Data

The Symbiota data schema is strongly aligned to the Darwin Core data exchange standard. Data compliant to any version of Darwin Core can be easily loaded into a data portal, though compliance is not a strict requirement. See the Compatible Symbiota Fields document to evaluate the compatibility of your data structure. The Symbiota Spreadsheet Template (Excel version) may also be useful for configuring your dataset for import. Also visit the Darwin Core Quick Reference Guide for further definitions of the Darwin Core data exchange fields. Note that field names do not have to match those used within Symbiota, since the built-in schema mapping utility will ensure that data maps to the correct field. The import tools are able to resolve many of the more common data mapping issues thus it’s best to contact the data portal administrator to fully evaluate the compatibility of your data structure.

Natural History Collections can supply specimen data in accordance with one of three methods:

  1. Spreadsheet (CSV or tab-delimited): Since most database applications are capable of exporting flat data files, this is one of the more common import methods. Depending on server settings, importing large collections (over 50,000 records) can be problematic. However, compressing the data into a zip file makes the data more portable. Furthermore, this method can be used to refresh only the new and most recently modified records within a large collection.
  2. IPT or Darwin Core Archive: This is the preferred method of transfer for large data set. Identification and image extensions can be included.
  3. DiGIR provider: This method can be slow for large datasets and  DiGIR providers typically do not include image and annotation info.
  4. Direct read-only connection: Requires a database host, login, password, and an  SQL statement that flattens specimen records into the appropriate fields. Firewall access issues are often prohibit with this option.
  5. Other available data loading methods are described on the Data Interoperability page.

Regular updates: Once the data format is evaluated and initially loaded, portal managers can usually set up methods for collection managers to perform regular updates of a collection via a password protected web interface.

Annotation information: If available, the full determination history can be included. This is usually supplied as a separate data file/source. Required fields include: determiner, determination date, taxon identification, and unique identifier for specimen record (specimen primary key). Other useful information when available: treatment used, notes.

Images: Are the images online and are the URLs stable? If so, then the specimen records can be linked to these existing images. If data is being supplied as a spreadsheet, the URLs to the JPG image can be placed in field labels as associatedMedia. Multiple images per specimen can be separated by semicolons. If data is being supplied as a Darwin Core archive, image URLs can be added as an image extension file. Alternatively, image URLs can be supplied in a spreadsheet that contains the catalog number or the specimen’s primary key (unique id for specimen record). If the images are not yet online, then image loading details will have to be worked out with the portal managers.

When possible, include information for the following questions:

  • How is your data stored (e.g. MySQL, Oracle, MS Access, FileMaker, Excel, etc)?
  • Do you have a field used to tag specimen records as needing locality protection? See the Rare Species Protection page for more information on the safe guarding of rare and threatened species.
  • Character encoding (character set) of your data (e.g. ISO-8859-1, UTF-8, Windows-1252, etc)?