Digitization Workflows

The following documentation was developed for iDigBio’s BioDigiCon 2022 workshop on Digitization Workflows Using Symbiota Portals. Here we provide a basic visual overview of several sample digitization workflows, related protocols produced by the TCN, and documentation pages related to the tools used by these TCNs.

For detailed instructions of how to use the many digitization tools available in Symbiota portals, visit the Symbiota Docs.

Sample Workflows

1. SERNEC TCN

The SERNEC Thematic Collections Network (TCN) was a large collaboration of 99 collections in the Southeast U.S., led by Zack Murrell at Appalachian State University. These collections digitized herbarium specimens using the SERNEC portal. SERNEC is now a consortium of over 200 herbaria (more information).

Workflow for digitizing herbarium specimens for the SERNEC TCN in the SERNEC portal.
  1. Concurrently image specimens and enter skeletal data into portal
    1. In this step, skeletal data (i.e., only a few fields, such as scientific name, county, and state) are added to the data portal as the specimen is being imaged.
    2. TCN Protocol: SERNEC specimen imaging protocols
    3. Tutorial: Skeletal data entry tutorial & video
    4. Video: Getting images into a Symbiota portal
  2. Upload and link images via CyVerse (external data store).
    1. SERNEC collections contracted with CyVerse to host their web-ready jpg images. CyVerse ingested the jpgs and created URLs to those images. Symbiota then developed a tool that harvested the CyVerse URLs and automatically link them to records in the SERNEC portal.
    2. TCN Protocol: SERNEC image uploading protocols
    3. Tutorial: Batch uploading images in Symbiota portals
  3. Transcribe labels in SERNEC portal or Notes from Nature (external crowdsourcing platform).
    1. Data entry (transcription) was conducted by technicians or staff in home databases or in the SERNEC portal, or by volunteers using the Notes from Nature crowdsourcing platform.
    2. Tutorial: Transcribing labels in a Symbiota portal
    3. TCN Protocol: Uploading data into Notes from Nature
    4. Tutorial: Exporting data from a Symbiota portal
    5. Tutorial: Importing data into a Symbiota portal
  4. Georeference in SERNEC.
    1. Georeferencing was largely accomplished using in-portal georeferencing tools including GEOLocate, batch georeferencing, and the Google Maps plugin.
    2. Tutorial: Georeferencing in Symbiota portals
    3. Tutorial: Batch georeferencing tool

2: California Phenology (CAP) TCN

Led by California Polytechnic State University’s Jenn Yost, the California Phenology Network is more recent TCN of 28 herbaria. This consortium largely digitize and publicize their data in the CCH2 portal. More information about this project can be found on their website.

Workflow for digitizing herbarium specimens for the CAP TCN in the CCH2 portal.
  1. Image specimens
    1. In this workflow, specimens are barcoded and imaged without interfacing with the portal.
    2. TCN Protocol: CAP specimen imaging protocols
  2. Upload and link images via CyVerse (external data store).
    1. CAP also worked with the CyVerse data store to host web-ready jpgs of their images.
    2. TCN Protocol: CAP image uploading protocols
    3. Tutorial: Batch uploading images in Symbiota portals
  3. Transcribe labels in CCH2 portal or Notes from Nature (external crowdsourcing platform).
    1. Data entry (transcription) was conducted by technicians or staff in home databases or in the CCH2 portal, or by volunteers using the Notes from Nature crowdsourcing platform.
    2. Tutorial: Transcribing labels in a Symbiota portal
    3. Tutorial: Exporting data from a Symbiota portal
    4. Tutorial: Importing data into a Symbiota portal
  4. Georeference in CCH2 and GEOLocate CoGe.
    1. Georeferencing was largely accomplished using in-portal georeferencing tools including GEOLocate, batch georeferencing, and the Google Maps plugin.
    2. Tutorial: Georeferencing in Symbiota portals
    3. Tutorial: Batch georeferencing tool
    4. TCN Protocol: Georeferencing training course

3. Big Bee’s Image Library

The “Big Bee” TCN is led by Katja Seltmann at University of California, Santa Barbara. The workflow outlined below specifically describes how this TCN integrates images with specimen records in the Bee Library. Because Big Bee is an actively funded TCN (NSF Award #2102006 and others), TCN protocols for this workflow are still in development. More information about the Big Bee TCN can be found on their website.

  1. Image specimens. 
    1. In this workflow, 2D and 3D images are acquired of select specimens. The goal is to capture broad taxonomic representation across the collections of participating institutions.
    2. TCN Protocol: Macropod Video Tutorials
    3. TCN Protocol: How to Focus Stack and 3D Model Using Photogrammetry
  1. During imaging, add “tags” to image file names.
    1. After image capture, image files are renamed to include unique specimen identifiers (e.g. catalog numbers) followed by shorthand “tags” according to a predetermined vocabulary, e.g. “UCSB-IZC00009199_had_lbs_3x.JPG” where “had” and “lbs” indicate “habitus, dorsal view” and “labels”, respectively.
  2. Ingest images to CMS and make them web-accessible
    1. In this workflow, the preferred method of making images web-accessible varies by institution. For example, images may be hosted online by ASU, via an institutionally-managed DAMs, or through an external hosting service.
    2. One ingested, a stored procedure is implemented on the backend of the Symbiota portal so that any tags embedded in the image file names are routinely and automatically indexed as part of the image’s metadata. Multiple tags may be included per image. Contact the Hub for assistance with setting up stored procedures.
    3. Tutorial: Tagging images
    4. Tutorial: Batch uploading images in Symbiota portals
    5. Video: Getting images into a Symbiota portal 
    6. Tutorial: Ingesting images into your portal
  3. Search for images in the portal to retrieve specimen images according to tags
    1. Once tagged images are imported into the portal and processed by the stored procedure, they can be searched for according to their tags in the portal’s Image Search form. Test this functionality using the “Image Tag” field in the Bee Library’s image search form:
      https://library.big-bee.net/portal/imagelib/search.php 

4: New Acquisitions at ASU’s BioKIC

Arizona State University (ASU)’s Biodiversity Knowledge Integration Center (BioKIC) extensively uses Symbiota in all aspects of its collections management and digitization workflows. This example explains how ASU’s Vertebrate Collections integrates the Consortium of Small Vertebrate Collections (CSVColl) portal into its workflow for the routine digitization of new acquisitions. 

Workflow for digitizing new acquisitions in the ASU Vertebrate Collections in the CSVColl portal.
  1. Image specimens.
    1. Image all views of specimen (dorsal, ventral, lateral) in the field, if possible. Use the same camera for each trip when possible.
    2. If desired, specimens and/or their labels may be imaged post-curation, and these files can be imported into the portal. ASU’s collections host all images on ASU servers.
  1. Enter field data into the portal.
    1. A new occurrence record is created by entering all relevant field data into the portal. This information is captured through direct data entry (e.g. direct transcription from specimen labels and field notes) or by uploading a spreadsheet of correctly formatted field data into the portal.
    1. Tutorial: Transcribing labels in a Symbiota portal
    2. Tutorial: Importing data into a Symbiota portal
  2. Print specimen labels directly from portal.
    1. Once the occurrence record is populated, specimen labels can be generated, formatted, and printed directly from the CSVColl portal.
    2. Tutorial: Label customization and printing
  3. Place labels with prepared specimen(s).
    1. Labels are placed with dry and fluid-preserved specimens and specimen lots for storage in the collection.
    2. Accession documentation (permits, donation paperwork, etc.) is managed externally from the CSVColl portal.

5: NEON Sample Ingestion

The National Ecological Observatory Network (NEON) employs Symbiota to manage a large volume and diversity of samples processed at ASU’s BioKIC facility. This example illustrates the new sample intake process using the NEON Biorepository Data Portal, which features several highly customized tools to facilitate NEON’s high throughput digitization workflow.

  1. Acquire and check-in new samples.
    1. Inventory of new samples from a NEON collecting site is received at ASU for archiving. Samples are checked in and data are ingested into the NEON Data Portal using two custom Symbiota modules.
    2. NEON Docs: NEON Sample Types
    3. NEON Docs: FAQ & Getting Started
  2. Use NEON API to harvest NEON sample data and create new occurrence records.
    1. More information about the NEON API can be provided by the NEON’s bioinformatics staff.
  3. Assign unique identifiers to samples using custom Symbiota modules.
    1. IGSNs (globally unique identifiers) are assigned, registered, and associated with samples in the NEON Data Portal via SESAR‘s API.
  4. Enhance new specimen and sample records using Symbiota.
    1. Samples are stored with new labels, associated with digital images and ancillary measurement, all using NEON’s Symbiota portal.

Workshop Recording

Digitization Workflows in Symbiota-based Portals
with Katie Pearson & Lindsay Walker
[Recording]

Additional Resources

Many TCNs have integrated Symbiota portals into their digitization and data publishing workflows. Addition TCN protocols can be found on their respective websites:

Acknowledgements

Thank you to Dakota Rowsey (ASU/BioKIC), Katja Seltmann (Big Bee TCN), and Kelsey Yule (ASU/NEON) for contributing workflow documentation featured on this page.

If you have developed a novel digitization workflow that integrates Symbiota portals and tools, please consider contributing to this page by contacting the Symbiota Support Hub.