Sharing Biodiversity Data

Mary Barkworth (

Natural History Museums and Herbaria exist to share knowledge. In previous centuries, this was accomplished through displays, publications, and presentations. Today, they are adding digital technology to their knowledge-sharing strategies, not just by providing digital versions of their displays, publications, and talks (although there are important) but by sharing the data associated with their specimens. These data include information on when, where, and by whom a specimen was collected. Sharing such information in a standardized manner makes it possible to use it in ways not envisioned by the original collector, for instance in determining whether there has been a change in flowering time at a locality in the last 100 years or whether plants at different elevations differ in how much their flowering time has changed. The reliability of these analyses is increased when they are based on a large sample size. By sharing the data associated with their specimens, natural history collections are making it much easier to conduct such analyses.

This lesson is part of proposed collection of educational modules about sharing biodiversity data, such as that associated with collections, and the standards that need to be followed when doing so to maximize the value of the data. The goal of the collection is to help people learn how to collect, provide, and share high quality biodiversity data.

Each lesson within a module will contain some questions for use in evaluating one’s understanding of the material. There will also be some questions relating to the whole module. The correct answers, or a selection of appropriate responses, will be provided at the end of the lesson/module. The modules are not part of a formal online educational offering. The long-term goal is to convert the them into such offerings and include some form of certification but, for now, they are simply self-help modules for use by anyone interested. If you have questions or comments, please send them to me (; I shall try to respond to questions in a timely fashion and will use comments to improve the existing offerings.

The modules are based on use of Symbiota software and DarwinCore fields (DwC).  There are other programs for capturing and displaying biodiversity data but Symbiota is widely used, offers networks for all kinds of eukaryotic organisms, and, importantly, is software with which I am familiar.

There will be no set order in which to complete the modules because people will be coming at data sharing from different perspectives and different backgrounds. If one module relies on information in another, this will be stated at the start. The focus is on Symbiota but Symbiota can interact with many other systems and we encourage anyone interested in developing a new connection or development to contribute a module explaining what the connection/development offers and how to use it.

The listing below is a tentative listing of the modules to be developed. Some of the modules will be developed by, or in collaboration with, others. Anyone interested is assisting develop (or test) them should contact me.

Mary Barkworth


Module1 Providing Data, Pt. 1 General

  1. The importance of standards
  2. Databases compared to spreadsheets
  3. Relational databases
  4. Connecting information from different sources
  5. Unique identifiers
  6. What are they
  7. Why are they needed?
  8. How are they generated?
  9. Linking standardized records

Module 2 – Providing data, Pt. 2. For collectors

  1. Field notes
  2. Purpose
  3. Importance
  4. Fields
  5. Who
  6. When
  • Where
  1. Individualizing
  2. Media files
  3. Templates
  4. Using Symbiota for field data
  5. Generating labels

Module 3 – Providing data, Pt. 3. Existing specimens

  1. Upload or Direct Data Entry (DDE)
  2. DDE with Symbiota, Pt. 1
  3. Built in aids (Use with caution)
  4. The specimen
  5. Who
  6. When
  7. What
  8. Name problems
  9. Reference
  10. Where
  11. Ecology
  12. Description
  13. Notes (Occurrence remarks)
  14. Phenology/Life stage
  15. Sampling technique
  16. Curatorial fields

Bulk Upload

There are at least two situations in which collections may wish to upload several records to a Symbiota network rather than enter each one individually:

  1. They have their own or an institutional database they wish to maintain, possibly because it has more of the functionality that they need for maintaining their records or
  2. They have several records in a spreadsheet or local database that they wish to upload even though they intend to switch to (or continue using) Direct Data Entry.

Module 4 – Data cleaning in Symbiota

  1. Searching
  2. Standard
  3. Maps
  4. Data cleaning tools
  5. Duplicate discovery


Module 5 – Georeferencing

  1. Overview of geographic reference systems
  2. Latitude and longitude
  3. Datums
  4. Uncertainty
  5. Technology the good and the less good
  6. Elevation

Module 6 – Sharing via Symbiota

  1. IDigBio
  2. GBIF
  3. Other aggregators

Module 7 – Collaborative Data Acquisition and Improvement

  1. Notes from Nature
  2. Geolocate

Module 8 – Using data and other resources in Symbiota portals

  1. Permissions
  2. Acknowledgements
  3. Caveats
  4. Data always need cleaning. Please inform collection managers if you do it so all may benefit.
  5. Searching for dates: Use the format Year-mo-dy even if you are interested in collections made in a specific year because the Year field is not automatically completed. For example, when searching for collections made from 1800 through 1900, search from 1800-01-01 to 1900-12-31.
  6. Calculating phylogenetic density (if funding for implementation obtained).
  7. Using the phenology calculator (if funding for implementation obtained).