Darwin Core Archive Publishing
A Darwin Core Archive (DwC-A) is a data standard that is commonly used to package species occurrence data into a single, self-contained dataset (http://en.wikipedia.org/wiki/Darwin_Core_Archive). Publishing data as a DwC-A data package is a convenient way to get your data out to researchers and data aggregators such as iDigBio and GBIF. Along with the core occurrence records, a DwC-A created through a Symbiota instance will by default include determination history and specimen image data extensions. Locality details that are listed within the rare and sensitive species lists are protected. An RSS feed listing the DwC-A along with their publishing details allows external projects to programmatically monitor when data packages are refreshed or new collections are made available. Visit the sitemap of any Symbiota portal to view a list of DwC-A packages published through that resource.
Individual collection managers can follow the steps below to publish or refresh their DwC-A data package. Portal managers can publish DwC-A packages for multiple collections as a batch process by using the management tools associated with the Darwin Core Archive Publishing page that is available from the sitemap of any portal. Note that the portal manager will need to make arrangements with iDigBio (http://idigbio.org) or GBIF (http://www.gbif.org) to register their collection and arrange for automated data harvesting.
- Log in and go to your main management menu for your collection. The easiest path to this menu is by clicking on My Profile, Specimen Management tab, and then your collection name. If you don’t see your collection name, you may not have the necessary administrative permissions to manage your collection and should contact your portal administrator.
- Review the metadata associated with your collection. In addition to your institution/collection codes and contact information, make sure to review data usage license and GUID (see below). Contact your portal manager if you are unclear about any of the available fields.
- Data Usage License: Many portals have predefined options that follow the Creative Commons licenses, though that can vary by portal instance.
- GUID source: This is the field that will serve as the persistent globally unique identifier for specimen records published out of the portal. Within Symbiota, there are the following options.
- Occurrence ID: This option is for collections that create their own GUID using another protocol or manage their data in an external system and map the incoming GUID to the occurrenceID field at time of import. If you manage your central data within an update-to-date Specify database, this is the option you should use since the import data should automatically contain a Specify generated UUID. Darwin Core definition: http://rs.tdwg.org/dwc/terms/index.htm#occurrenceID
- Catalog Number: You can use this option if your catalog number is configured as a globally unique identifier. See the Darwin Core occurrenceID definition above.
- Symbiota generate UUID: If you are managing your data within the Symbiota instance (e.g. portal dataset is your central database), this option is the easiest and more reliable methods for assigning robust GUIDs to your specimen records. Even if your catalog number is configured as a unique identifier (e.g. institutionCode:collectionCode:catalogNumber format type), this is a more robust option when managing data within the portal. If you are only using the Symbiota portal to publish a snapshot of your data that is managed in another database system (e.g. Specify), then selecting this option is probably a bad idea.
- Select Darwin Core Archive Publishing from your management menu. If a DwC-A was previously published, information will be displayed describing the details of the archive.
- Review and modify the default publishing options: include determination history, include image URLs, and redact sensitive locality data.
- Select the Create Darwin Core Archive button to build a new archive.