Biodiversity Data Mobilization

Mary Barkworth, January 5, 2018

What does Biodiversity Data Mobilization mean?

Let’s start by looking at the meaning of each of the words in the title.

Biodiversity is short for biological diversity, but it is a word that means different things to different people. To some it means different kinds of organisms, such as different species, genera, or families, in an area or habitat or the variability in the abilities of organisms – some fly, some swim, some slither, some just stay in one place all or almost all their life.  But it can also mean the variability within a species or even within a gene across species.

Examples of Biodiversity Surveys

  • A survey of the different kinds of plants in an area. The “kinds” might mean species but it could also be for different forms of plants such as grasses, herbaceous flowering plants, shrubs, and trees.
  • A record of how different plants of the same species react to drought stress.
  • A listing of the kinds of insects caught in a bowl of water.
  • A DNA analysis of a sample of water or soil. In these cases, no distinct organisms are seen directly but their presence is inferred from the DNA signatures.

In each of these activities, what is being recorded is the presence of different kinds of biological organisms, in other words, biodiversity.

Despite the many different uses of the word “biodiversity”, it is usually easy to tell how it is being used from the context. Just remain alert to the fact that, although it always refers to organisms, it may do so at different levels of biological organization.

Data refers to a bit of information. Technically, one bit of information would be a datum but most people treat data as both singular (one bit of information) and plural (two or more bits of information). A single bit of information usually has little value; it is more valuable to have lots of bits of information on a topic, that is, lots of data on a topic.

Before starting to gather data on a topic it is important to think through why the data are needed. This will help ensure that the data really are useful. This aspect will be addressed in more detail in another presentation but it is important to bear in mind. For example, if one wants to know the height of children at different ages, it is important to record also whether the child is male or female because male and female children have different growth patterns.

Examples

  • Datum – the number of eggs laid by one hen on one day. Data – number of eggs laid on each of several days by that hen or the number of eggs laid by each of several hens on several different days. The first set of data would show how the egg production of the studied hen varied through the year, the second set would make it possible to compare egg production of different hens throughout the year.
  • Datum – the birth weight of a baby. Data – the birth weight of several babies, perhaps all those born at different locations. Examination of the birthweight data from different locations would help a region’s health services identify where there seem to be problems. The data are useful; the datum is not. One needs to be able to compare one baby’s birthweight with that of other babies to know if it is average, underweight, or overweight.
  • Datum – the weight of insects caught in a malaise trap over a one week period. Data – the weight of insects caught each week in a malaise trap and/or in several different malaise traps. Malaise traps catch flying insects many of which pollinate fruit crops and/or are eaten by birds which may also be pollinators but may be more important for keeping down insects we consider harmful. Knowing the weight of insects caught one week is not useful; knowing the weight of insects caught each week of the year, particularly if recorded for several years, can be used to detect problems, the first step to identifying the cause of a problem. Again, the data are useful; the datum is not.

Mobilization means making data available for use by others. In the past, this might mean publishing it in a print document that could be read by many people. Today, mobilization usually means making something available on the web. Print documents (and pdfs) have limited value. Anyone wanting to make use of the data they contain (the data, not the conclusions) must re-enter the data into a computer file (often a csv file) that a computer can read. But, to make the data truly mobile, they then need to be published to a freely accessible web site. Once mobilized, data can be downloaded and imported for use in other analyses, possibly in combination with similar data from other data sources, possibly combined with other kinds of data so they can be used to create new insights.

Examples

There are many web sites that provide access to mobilized biodiversity data. Only a few are mentioned here.

  • Biodiversity Heritage Library (https://www.biodiversitylibrary.org/). “The Biodiversity Heritage Library improves research methodology by collaboratively making biodiversity literature openly available to the world as part of a global biodiversity community”.
  • Tropicos (http://www.tropicos.org/) and IPNI (http://www.ipni.org/) provide information about vascular plant names -when, where and by whom published and their status. Tropicos also shows how a name has been used in various works and links to relevant pages in BHL. Index fungorum (www.indexfungorum.org )provides nomenclatural information for fungi. Nomenclatural resources for animals are being developed and maintained by specialist groups.
  • The Global Biodiversity Information Facility (GBIF) exists to provide “free and open access to biodiversity data”. The data come from natural history collections and specialist observer groups around the world. On Jan 4, 2018, GBIF’s provided access to 965,808,221 records. The date is important because new records are added daily. These can be searched, mapped, and downloaded. The records came from 1143 different institutions. Many natural history collections provide data via other websites and/or their own website but providing data to GBIF makes it easier for researchers to develop a global picture of species distributions while increasing an institution’s, and its contributors’, exposure to other researchers.
  • Symbiota software (Symbiota.org) is used by over 200 institutions to share specimen data via themed networks, e.g., Mycoportal (http://mycoportal.org), SCAN (http://scan-bugs.org), some of which are associated with multiple urls (e.g. SEINet, accessible via http://Intermountainbiota.org, http://ngpherbaria.org, http://swbiodiversity.org, and other portals). Symbiota also enables connected institutions to share their records other aggregators such as GBIF and iDigBio (https://www.idigbio.org/ )as well as other themed networks. For example, institutions contributing to OpenHerbarium can enable Mycoportal to access its fungal records. Both GBIF and Symbiota enable data downloads. The difference is that Symbiota, unlike GBIF, is software, not a data access portal. Portals to Symbiota networks offers more tools for data visualization than GBIF.

Visualization is not mentioned in the title of this presentation. It means presenting data in a way that is visually appealing and informative. Data tables and csv files are not, in themselves, inspiring nor do they, by themselves, help provide insight into the data. Statistics were developed to help provide insight. They are an important means of communication among specialists but the most effective way to engage people with data involves visualization – presenting data in a visually attractive and easily understood manner. Visualization is an art worth cultivating.

Examples

  • Any video posted by the Gapminder Organization is worth watching. Try starting with The Joy of Statistics (https://www.gapminder.org/videos/the-joy-of-stats/).
  • Search your local newspaper and the web for data visualizations (try “data are beautiful”) for examples of data visualization. This will clarify that data visualization is a critical element in many presentations, including newspaper accounts.
  • Symbiota networks enable visitors to see images of a species, distribution maps of 1-5 species, develop checklists for a specified locality, develop illustrated checklists for use in teaching and training field crews, and overlay distributional information on environmental layers. Some of these abilities are better developed in some Symbiota networks than others. Check them out in Mycoportal.

Responsibilities

There are responsibilities associated with using mobilized data. One is that the source of the data be credited. This is very important. Obtaining data and enabling its mobilization requires work at many different levels. Much of this work is not fully funded. Providing full and complete citation of data used in any publication, including blog posts, will help generate support for data mobilization. Record the web site, the date of your download, and the search terms used. Some sources, such as GBIF, provide a DOI for all downloads. Others will probably do so in future.

Why should individuals mobilize data they have acquired? it will draw attention to their work, attention that may lead to initiation of new, productive interactions. It will also help them document the value of their work to their funding agencies, it be an employer, a government agency, or a non-government agency. It will also help their funding sources show the value of what they are funding. An additional reason is that many of the top-ranked journal require authors to make their data available. As one step to making it easier to develop a comprehensive view of your contributions see what you do, register for and use an ORCID in all your research activities (see http://ORCID.org). These will become of increasing value as more databases enable their use.

Examples

  • The citation below uses the format recommended by GBIF for a download on December 8. The link takes a user to data on the search terms used and, importantly, which institutions and organizations provided the records it contains.

GBIF.org (8th December 2016) GBIF Occurrence Download http://doi.org/10.15468/dl.mqfuje

  • Symbiota does not, as yet, enable providing a download with a DOI. The format suggested is:

OpenHerbarium.org (4 January 2018). Somaliland Checklist. http://openherbarium.org/portal/checklists/checklist.php?cl=6&pid=4.

Intermountainbiota.org (4 January 2018). Taxon page for Tragopogon dubius. http://intermountainbiota.org/portal/taxa/index.php?taxon=Taraxacum%20officinale

Discussion questions

Biodiversity

What different kinds of biological organism are represented in your classroom? The area where you eat? Your food? In the market or food stores where you shop most often

What different kinds of organism are present in the human gut? Which is the most abundant kind? How do they affect human health?

Data

Data need to be gathered with a purpose in mind. Discuss the data needed to answer the following questions. You may wish to narrow the question by, for example, by making it refer to a particular locality or time.

  1. How many people have a cell phone?
  2. Are there more insects walking across a 1 metre square plot of soil at noon than at 8am?
  3. How many 10yr old children can read books written in their own language?
  4. How many entering college students have used a computer-based spreadsheet?
  5. How does the weight of a goat change as it grows older?

Mobilization

What data sources on the web do you use most frequently? Explain how you use them/

What data would you like to be find on the web that do not appear to be there at present? How would you use the data if it were available?

Visualization

Find at least 10 examples of data visualization in different resources (consider newspapers, text books, advertisements as well as the web). Select the three you consider best and explain what you like about them.

Find about five papers in your field that present data and discuss whether the presentation could be improved by more, or different, visualizations. Do not just say yes, visualization would make it easier for others to understand the data; explain HOW it should be visualized. At this point, do not worry about how to do what you propose. Think about what would make for a more effective presentation. Hint: Start by asking why the data were presented? What point was the author trying to make?

Responsibilities

List some sets of data that interest you. Explain why you find them interesting and how you would they should be cited.