Publishing Occurrence data to iDigBio

A Darwin Core Archive (DwC-A) is a data standard that is commonly used to package species occurrence data into a single, self-contained dataset (http://en.wikipedia.org/wiki/Darwin_Core_Archive). Publishing data as a DwC-A data package is a convenient way to get your data out to researchers and data aggregators such as iDigBio and GBIF. Along with the core occurrence records, a DwC-A created through a Symbiota instance will by default include determination history and specimen image data extensions. Locality details that are listed within the rare and sensitive species lists are protected. An RSS feed listing the DwC-A along with their publishing details allows external projects to programmatically monitor when data packages are refreshed or new collections are made available. Visit the sitemap of any Symbiota portal to view a list of DwC-A packages published through that resource.

Individual collection managers can follow the steps below to publish or refresh their DwC-A data package. Portal managers can publish DwC-A packages for multiple collections as a batch process by using the management tools associated with the Darwin Core Archive Publishing page that is available from the sitemap of any portal. 

  1. Contact iDigBio (data@idigbio.org) requesting that your collection gets added to the iDigBio ingestion queue. Include a link to your collection profile page (e.g. http://swbiodiversity.org/seinet/collections/misc/collprofiles.php?collid=1) so that they have the necessary information for your collection and access to your data.   
  2. Log in and go to your main management menu for your collection. The easiest path to this menu is by clicking on My Profile, Specimen Management tab, and then your collection name. If you don’t see your collection name, you may not have the necessary administrative permissions to manage your collection and you will need to contact your portal administrator.
  3. Review the metadata associated with your collection. In addition to your institution/collection codes and contact information, make sure to review data usage license and GUID (see below). Contact your portal manager if you are unclear about any of the available fields.
    • Data Usage License: Many portals have predefined options that follow the Creative Commons licenses, though that can vary by portal instance.
    • GUID source: This is the field that will serve as the persistent globally unique identifier for specimen records published out of the portal. Within Symbiota, there are the following options.
      1. Occurrence ID: This option is for collections that create their own GUID using another protocol or manage their data in an external system and map the incoming GUID to the occurrenceID field at time of import. If you manage your central data within an update-to-date Specify database, this is the option you should use since the import data should automatically contain a Specify generated UUID. Darwin Core definition: http://rs.tdwg.org/dwc/terms/index.htm#occurrenceID
      2. Catalog Number: You can use this option if your catalog number is configured as a globally unique identifier. See the Darwin Core occurrenceID definition above.
      3. Symbiota generate UUID: If you are managing your data within the Symbiota instance (e.g. portal dataset is your central database), this option is the easiest and more reliable methods for assigning robust GUIDs to your specimen records. Even if your catalog number is configured as a unique identifier (e.g. institutionCode:collectionCode:catalogNumber format type), this is a more robust option when managing data within the portal. If you are only using the Symbiota portal to publish a snapshot of your data that is managed in another database system (e.g. Specify), then selecting this option is probably a bad idea.
  4. Go to “Darwin Core Archive Publishing” from within your management menu and click the Create Darwin Core Archive button to build a new archive. iDigBio will periodically monitory the RSS feed that is associated with the DwC-Archive library within the portal. You should periodically refresh this archive to ensure the data within iDigBio remains current. Contact your portal manager to request automatic updates of your archive on a regular schedule.