Starting a Symbiota Collection

Becoming a data provider is easy, it involves providing basic information about your collection to a portal manager and deciding whether to setup a “live collection” versus a “snapshot collection” and whether to have the Symbiota portal serve your data to global aggregators (e.g., iDigBio, GBIF). 

The Symbiota data schema is strongly aligned to the Darwin Core data exchange standard. Data compliant to any version of Darwin Core can be easily loaded into a data portal, though compliance is not a strict requirement. See the Compatible Symbiota Fields document to evaluate the compatibility of your data structure. The Symbiota Spreadsheet Template (Excel version) may also be useful for configuring your dataset for import. Also visit the Darwin Core Quick Reference Guide for further definitions of the Darwin Core data exchange fields. Note that field names do not have to match those used within Symbiota, since the built-in schema mapping utility will ensure that data maps to the correct field. The import tools are able to resolve many of the more common data mapping issues thus it’s best to contact the data portal administrator to fully evaluate the compatibility of your data structure.

There are four general categories of people connected to Symbiota portals.

  • Core developers who provide the development and software support for all the portals.
  • Portal managers and power users who offer front-end support and added-value functions such as checklists and taxonomic tables. Data providers and end users interact mostly with portal managers. Power users are particularly important for identifying bugs and helping developers create new functionalities.
  • Data providers, the largest group of Symbiota users, are primarily undergraduate students and volunteers managed by collection curators. There are currently over 1,000 data providers for the 40+ portals. Data providers also offer critical feedback that has helped create efficient workflows for transcribing labels and providing images. Input from core developers, portal managers, power users, and data providers has led to a 50 percent reduction in digitization cost per specimen.
  • End users include researchers and educators that integrate the data and resources provided into their research and teaching and land managers who use the data to guide restoration efforts, species conservation planning, and track invasive species.

Natural History Collections can supply specimen data in accordance with one of three methods:

  1. Spreadsheet (CSV or tab-delimited): Since most database applications are capable of exporting flat data files, this is one of the more common import methods. Depending on server settings, importing large collections (over 50,000 records) can be problematic. However, compressing the data into a zip file makes the data more portable. Furthermore, this method can be used to refresh only the new and most recently modified records within a large collection.
  2. IPT or Darwin Core Archive: This is the preferred method of transfer for large data set. Identification and image extensions can be included.
  3. Direct read-only connection: Requires a database host, login, password, and an  SQL statement that flattens specimen records into the appropriate fields. Firewall access issues are often prohibit with this option.
  4. Other available data loading methods are described on the Data Interoperability page.

Regular updates: Once the data format is evaluated and initially loaded, portal managers can usually set up methods for collection managers to perform regular updates of a collection via a password protected web interface.

Annotation information: If available, the full determination history can be included. This is usually supplied as a separate data file/source. Required fields include: determiner, determination date, taxon identification, and unique identifier for specimen record (specimen primary key). Other useful information when available: treatment used, notes.

Images: Are the images online and are the URLs stable? If so, then the specimen records can be linked to these existing images. If data is being supplied as a spreadsheet, the URLs to the JPG image can be placed in field labels as associatedMedia. Multiple images per specimen can be separated by semicolons. If data is being supplied as a Darwin Core archive, image URLs can be added as an image extension file. Alternatively, image URLs can be supplied in a spreadsheet that contains the catalog number or the specimen’s primary key (unique id for specimen record). If the images are not yet online, then image loading details will have to be worked out with the portal managers.

When possible, include information for the following questions:

  • How is your data stored (e.g. MySQL, Oracle, MS Access, FileMaker, Excel, etc)?
  • Do you have a field used to tag specimen records as needing locality protection? See the Rare Species Protection page for more information on the safe guarding of rare and threatened species.
  • Character encoding (character set) of your data (e.g. ISO-8859-1, UTF-8, Windows-1252, etc)?