The following content was created in conjunction with preparing a BIFA-GBIF proposal for Pakistani herbaria.
Author: Mary E. Barkworth (ORCID 0000-0001-9785-1538)
Email address: firstname.lastname@example.org
Affiliation: Intermountain Herbarium, Utah State University, 5305 Old Main Hill, Logan, Utah, U.S.A. 84322-5035
Abstract: Science today is distinguished by the ability to draw on massive amounts of data and collaboration. This is most evident in fields that can rely on mechanical data capture, but it is also true in other fields – including systematic botany. Today’s taxonomists must learn to participate in this new knowledge landscape. Doing so will magnify the impact of their work, reduce the amount of time spent copying existing information, encourage better practices in data recording, and promote collaboration. I shall demonstrate how Symbiota, open source software currently used by over 400 natural history collections, can assist collections, collectors, and scientists in many disciplines benefit from and contribute to the new landscape while assisting in publication of well-documented scientific papers and generation of outreach resources. In so doing, they will reinforce the importance of taxonomy, taxonomists, and natural history collections to biodiversity research. The talk will be illustrated with examples from OpenHerbarium (http://Openherbarium.org). It will also show how bidirectional links between Keybase (http://keybase.rbg.vic.gov.au/) and OpenHerbarium can provide access to readily updated taxonomic and floristic treatments.
Today’s world differs is changing rapidly. This is true of everyday life; it is also true of floristic and systematic botany. As practitioners and educators, we must take advantage of these changes, incorporating and introducing our students to those that are both beneficial and affordable. Fortunately, adopting new practices often becomes both easier and less expensive with time. I discuss the benefits of mobilizing and visualizing biodiversity data, a rapidly developing field that is providing new insights and understanding concerning the distribution of the world’s biodiversity. It is also a field that is relative easy and inexpensive to integrate into one’s research activities. Doing so increases the impact of one’s work as a scientist, educator, and provider of reliable information while improving understanding of a significant change affecting our lives, the practice of sharing data.
Before discussing how Data Mobilization and Visualization can increase the value of an individual’s work and an institution’s resources, it is useful to consider what the phrase means. Mobilization means making data available in a form that makes reuse easy. Printed tables and tables in word documents or pdfs are not mobile. Mobile data are data in a form that can be read by computers. They must also adhere to consistent standards so that it is evident how to compare them. The advantages of data sharing for individuals include more citations and possibly new collaborations but it also making possible insights that would not be possible without data sharing.
Visualization means displaying in ways that make it easier to interpret. Graphs are a simple form of visualization but today data visualization has become an art form. There are excellent, and informative, examples in the videos on Gapminder (2018+) (see, for example, https://www.gapminder.org/videos/hans-rosling-ted-2006-debunking-myths-about-the-third-world/). Typical visualizations of biodiversity data include distribution maps, phenology charts, and maps of biodiversity density.
Data mobilization and visualization are not magic pills. Good science still requires clear research questions, good research design, meticulous data acquisition, appropriate data analyses, and thoughtful conclusions. What data mobilization provides is greater use, and hence greater impact; generation of new insights by analysis of similar data from multiple sources; and new collaborations. In this paper, I show how Symbiota (2018+; Gries et al. 2014), open source software for integrating biodiversity resources can benefit both individual botanists and herbaria. It draws on a) OpenHerbarium (2018+), a developing Symbiota-based herbarium network; b) SEINet (Intermountainbiota 2018+), a mature Symbiota-based network of over 260 North American herbaria accessible via multiple portals; c) papers by Pakistani botanists and d) online resources that can aid plant and fungal taxonomists.
Symbiota (Gries et al. 2014) is an open source database management system that, by integrating resources from multiple individuals and institutions, enables rapid generation of information that, prior to data mobilization, would have taken years to develop. There are 40 Symbiota networks in existence today, organized around a taxonomic group and/or region. Together they provide access to over 37 million records from more than 760 collections.
What does Symbiota make possible? One can search a Symbiota network for two species, see a map showing how the two differ in their distribution, view a taxon page for each species that contains descriptions, images, and a link to the map. These resources still take time to develop, but it is time that is measured in months or years, not decades.
Symbiota enables specimen records to be downloaded for use in further analyses. For example, downloading the data for Balsamorhiza sagittata and Wyethia amplexicaulis and using Excel (2016) reveals that they also have a similar elevation range (61-3116m for B. sagittata; 259-3637m for W. amplexicaulis) and that B. sagittata is collected about one month earlier than W. amplexicaulis (Mar 3 – Aug 31 versus Apr 2 – Sep 12). One could also use the downloaded data to show collection date by year, species density by county, or generating checklists. Prior to data sharing, all such operations would have been much more time-consuming and more costly. All that one is expected to do is credit the source, either the network (see SEINet 2018) or, for checklists, the original publication or developer and, for images of living plants, the photographers. Most images in a Symbiota network have a Creative Commons BY SA NC license (Creative Commons 2018+). The specimen records are usually placed in the public domain but individual herbaria should be acknowledged if some of their records are of particular significance.
OpenHerbarium is a network that has been established for herbaria without access to a national herbarium network, particularly those in southwest Asia and Africa. The server is based in the US but, once there is sufficient representation from a country, it would be easy to set up a home page that would list first the herbaria from that country but draw from all the records in the network, including those from other countries. In the remainder of this paper, I explore how individual researchers and herbaria can benefit from using OpenHerbarium.
A product of many floristic studies is publication of a checklist of the species found in a region (e.g., Fazal et al. 2010 for Haripur; Khan et al. 2015 for Kotli). Such checklists are usually published as printed reports or checklists. They may include pictures but usually do not. The data in them can be used by others but anyone wishing to do so has to retype it. Let’s look at how using OpenHerbarium can increase the impact of such work using checklist for the two checklists cited.
Notice that, when the checklist is based on a published paper, the paper is cited at the top of the checklist. For some purposes it will be necessary to read the original paper. What added value does placing checklist in OH offer? First, the taxa can be viewed by family, alphabetically by species, or as a series of images plus clicking on any of the names will bring up a taxon page. The image-view and taxon pages links are not helpful at present because, so far, few images and descriptions have been added. With assistance, they can quickly become valuable tools. Descriptions and images can be added by anyone with appropriate permissions. Once added to OH, they become available to all the tools in OH, including the checklist and taxon page tools.
If voucher specimens for a checklist are in a herbarium that share its records with OH, changes in their identification can be discovered by the checklist manager and either accepted or rejected pending further study. Descriptive information can be added as notes, but it would be better to add such information to OH’s morphological database because this is the first step to creating an interactive, multiaccess keying resource for all checklists in the system.
It is easy to download the names in a checklist for comparison with lists from other regions. That will enable calculation of the taxonomic similarity among regions, determining whether regions one side of a river are more similar to each other than those directly across the river, or to compare the number of woody species or members of a particular family or clade in different regions. Using GIS tools, the area covered by a given checklist could be calculated and used to compare the number of species per hectare among regions. This would give additional meaning to statements that an area is highly diverse. The presence of a checklist online also helps stimulate interest in finding new species within an area. Such discoveries would be a positive impact of the paper, not a criticism of the original work.
Another advantage comes from establishing “parent-child” relationships among checklists. The Haripur and Kotli checklists have been set up as a “children” of the checklist for Pakistan. They could also be children and their respective provincial checklists. All species in a child checklist are automatically added to their parental checklists. One should, of course, also review the result to ensure that the additions are not simply the result of adding a synonym. One could also create a master checklist of medicinal plants of Pakistan by creating an empty Pakistan Medical Plants checklist then adding data from various checklists that state whether a plant has medicinal value. For more information on creating and using research checklists, see Barkworth (2018a).
Another use of checklists is in teaching students, field crews, or collectors of medicinal plants. For teaching checklists, the teacher develops a list of species students need to be able to recognize. This list is then uploaded as a csv file (Barkworth 2018b). The students can then review the species by looking at their images or, better yet, use the “Flashcard Quiz” tool (listed under “Games”. This shows them images for species on the list in a random order and asks for them to identify the species. If there are several images for a species in the system, all of them will be used in the Flashcard Quiz tool. Studying images does not replace studying the plants themselves; it provides another tool for students that requires little investment of time by the teacher once good images have been uploaded. Another “game” is “Drop leaf” in which the user is asked to guess which species has been named by seeing what letters it uses where. This helps students having trouble learning to spell scientific names.
The ability to store and integrate images of living plants into its tools is a major reason why Symbiota websites are popular with a wide range of users. Images can be contributed by anyone with the appropriate permission; they need not be professional botanists but must know their plants or work with someone who does so and, preferably, be willing to prepare voucher specimens. Many of the images in SEINet have been contributed by people who lack a formal botanical background but have become familiar with the plants around them. The images could also be used to develop offline resources for teaching medicinal plant collectors the distinguishing features of similar species having different commercial value.
In Symbiota networks, the images should be made available with no more restrictions than a Creative Commons (2018+) BY-SA-NC license (Credit the author, share with the same license, do not use for commercial purposes). To reduce download times, the files should be no more than 1MB in size. This may mean cropping and reducing the resolution of an image before uploading it. This also means that most are not of sufficiently high resolution for use in print publications. Contact information for photographers who contribute directly to OH is included with their images. This enables anyone interested in higher resolution versions to ask about their availability and cost. For more information on contributing images see Barkworth (2018c).
Plant collecting and specimen preparation are time-consuming but essential activities for floristic botanists. Here too, the tools in OH, and those some colleagues and I are planning to develop, can help. The tools help generate high quality, informative labels. The biggest impediment at present is that Symbiota is designed for online data entry. Even in countries with reliable, high speed internet access this can be frustrating. My colleagues and I are working to develop an easily installed, offline database that can be used to store collection information and generate labels. It must also be able, when connected, to download new and updated plant names (with their authors) and, as new countries are added, administrative regions. This database will also make it easy to generate the files needed uploading specimen records to OH. If the specimens are deposited in a herbarium that contributes records to OH, it will also be easy to add them to the herbarium’s records (see below) once they have been assigned a CatalogNumeber.
Informative labels start with high quality field notes so how does using a database aid in their production? There are two major reasons. One is that, after a few times of entering data into a label preparation database, people tend to record more complete field notes. Entering information from old specimens also helps them appreciate the difference between labels with minimal information and those with lots. Another reason is that, for effective data sharing, it is often necessary to supply more information that is initially apparent. For example, consider latitude and longitude information. For sharing, these need to be provided in decimal format (30.5 deg, not 30deg 30min) AND accompanied by information on the datum used and the uncertainty. The datum refers to the model of the earth used when determining the latitude and longitude. GPS units, by default, are set to WGS84 because that datum is both recent and universal. The third piece of information that needs to be included is the uncertainty associated with the lat/lon data. This uncertainty comes from the effects of such things as air pressure and terrain on the value reported. It is not the number reported by a GPS unit. With the GPS units most of us can afford, the uncertainty is at least 20m. If one makes a single recording for multiple specimens in an area, the latitude and longitude should be recorded at the center of the area and the uncertainty as the distance, in meters, to the most distant plant collected. Lat/lon data without statements of uncertainty and the datum used are not suitable for use in applications such as ecological modeling. Records with very large uncertainties, such as the 293295m associated with the 2 degree grid references in the Flora of Pakistan, would also be rejected. It is best to check before leaving for field work that one’s GPS is set to report latitude and longitude in decimal format, that it is using WGS84 as the datum and metres or kilometres as the unit of length. This will make it easy to transfer one’s field notes to OH and the offline Symbiota database.
Another area where reliable data sharing benefits from more information concerns the scientific name. Here it is the reference used for identification that is important. It tells other users what interpretation of the name was used. Citing the name’s author does not do that. Consider Triticum aestivum. Linnaeus is the person who published the name, but he would have placed many of the specimens we call T. aestivum in different species. Telling people that you are following the concepts used in the Flora of Pakistan or the Flora of Pakistan modified to reflect the taxonomy reflected in Tropicos, The Plant List or some other source clarifies how you are using the name.
Contributing records to OpenHerbarium
Once collection records are available in OpenHerbarium, they can be consulted by anyone visiting it. For now, the most effective mechanism for adding records to OH is using a spreadsheet for both creating records and uploading the records. I have designed a simple one for use in Pakistan (Barkworth 2018d). A colleague has begun to develop an offline database for creating the records and generating an upload file. An appropriately designed database would be better than a spreadsheet because it can incorporate a table of scientific names (with authors and families) and one of administrative regions. This makes it possible for databases to check the spelling of names being entered in these fields and automatically add the appropriate author(s) and families when labels are printed. This saves the time required to retype information that is already available in multiple sources. Our goal is to have an easily installed database that can be used to print labels, generate and upload file, and download new and modified scientific names and administrative regions.
If the database states that it cannot find a scientific name, there are two possibilities. It may be a typing error or an error in the reference being used. In some cases, changes to the International Code of Nomenclature for algae, fungi and plants (McNeill et al. 2012) may have led to changes in spelling or authorship of a name since publication of the reference being used for identification. The other possibility is that the name is not yet in OH’s database. To check the reason, one can consult the International Plant Names Index (IPNI 2018+) or TROPICOS (2018+) for vascular plant names, TROPICOS (2018+) for bryophyte names, and Index fungorum (IF 2018+) for fungal names. If the name needs to be added to the OH database, please let me know. OpenHerbarium will also add the family name based on the most recent recommendations of the Angiosperm Phylogeny Group (The Angiosperm Phylogeny Group 2016). This can be frustrating for those of us familiar with a different system but, as scientists, we need to incorporate the results of new research into our thinking.
OpenHerbarium has another nomenclatural feature that, with some modification of Symbiota, will be very helpful. Consider the name Eugenia jambolana Lam. According to the Plant List and The Taxonomic Name Resolution Service, its accepted name is Syzygium cumini (L.) Skeels. Tropicos shows that most references now use Syzygium cumini although one used S. jambolana (Lam.) DC. None of these sources tell one how the two genera are distinguished. A Google search found the paper that provides the answer: Schmid (1972). Adding the full citation for such article to the Taxonomic Tree in OpenHerbarium will save others from having to look for the information. At present such notes are only visible to people with permission to change the taxonomic tree but sharing such information would help us all answer questions about why a name has been changed. For that reason, it is on the list of desirable improvements for Symbiota. All it needs Is funding and developers. We shall also ensure that credit can be given to people who provide such information because it does take time.
Benefits for Herbaria
So far, my focus has been on how using OpenHerbarium as a resource for sharing information can benefit individuals. This section discusses the benefits for herbaria.
Herbaria exist to share information about plant, fungal, and algal diversity. At the time they were founded, in the 1500s, the idea of pressing and drying plants so they could be studied in winter was a major technological innovation. It soon led to the establishment of herbaria in association with medical schools and the exchange of specimens among distant herbaria, but one still had to visit a herbarium to find out what it had. Later, Index herbariorum (Thiers 2018+) provided basic information about the world’s research herbaria.
Today, Index herbariorum is still the first place botanists go to find out about herbaria in a region but they are more likely to go to the Global Biodiversity Information Facility (GBIF; 2018+) to find out which herbaria have specimens from the areas they are interested in. At present, no Pakistani herbaria are providing data to GBIF. To make the importance of Pakistan’s herbaria evident it is critical that a) they register with Index herbariorum or, if registered, ensure that the information presented is up to date; and b) they start making their records available to GBIF. Registering with and updating an entry in Index herbariorum is free and can be done online. If a herbarium is no longer active as a research herbarium, that information can be incorporated in the Index. What is important is that active herbaria be registered and that their information be current.
The second recommendation, that herbaria make records available to GBIF, is somewhat harder to address, but registering and contributing to OH makes it relatively easy because Symbiota can networks can make it simple for contributing collections to contribute data to GBIF in a suitable format. The one initial expense involved is the purchase of at least one barcode scanner and preprinted, archival barcodes. The barcodes help ensure that each specimen is uniquely identified within a collection. Currently, the cost for archival barcodes from one US supplier is $245 plus shipping for 3000 barcodes (the minimum order) and $715 plus shipping for 20,000 (both prices include a one-time set up charge and incorporation of a logo). Barcode scanners cost about $120 in the US. To register as a data contributor with GIBF, a herbarium should contact that organization.
Of course, the big cost in mobilizing specimen data is the time involved. It is significant, but starting this year is better than starting next year. The Intermountain Herbarium (UTC), which I used to direct, has around 275,000 specimens. We began databasing the specimens in the 1980s. It is still only about 60% databased but it is 60% databased. We began with no additional funding, entering only new specimens from the Intermountain Region, specimens annotated by specialists, and those that enabled us to answer an inquiry. After about five years, we realized that answering inquiries was becoming easier because of the database. Today, all new specimens are databased and imaged as are all specimens sent on or received back from a loan, and all those about which there is an inquiry. The herbarium does not have specific funding for data mobilization, but we are making progress. Since UTC started contributing records to SEINet, it has received fewer requests for loans than in the past and those that are received are for fewer specimens because people can select the records that interest them. Since we began imaging the specimens, we have also received comments that a specimen seems to be misidentified. We welcome such comments. When we receive them, we check the specimen and, if necessary correct the record and attach an appropriate annotation to the label.
Today, we are pushing students and other contributors either to enter their records directly into the network or to provide us with data as an appropriate csv file. That makes is possible to provide rapid access to their specimen records and ensures that their specimens do not add to the backlog of specimens needing to be databased. Making the records available rapidly is increasingly important because some journals now require that such information be shared as a condition of publication. Thus, becoming a data contributor is not just beneficial, it is essential to supporting Utah State University’s research activities. We do not attempt predict when all UTC’s specimens will be databased, georeferenced, and imaged but progress is being made.
We also know that UTC’s specimen records are being consulted more than they would be if they were not accessible online. What we do not know is how often they are consulted. One can obtain Google Analytics for the network but not for an individual collection. When funding it available, Symbiota will be modified to enable tracking usage at the record level and, as a result, provide herbaria and individuals with more informative usage statistics.
Creating an updated flora for Pakistan
The Flora of Pakistan (1970 – present) is a magnificent work that greatly facilitates development of regional checklists. Inevitably, however, many of the taxonomic treatments in the older volumes need updating: new species have been discovered and/or described, taxonomic changes have been made, changes in the International Code of Nomenclature (McNeill et al. 2012) require changes in some names and authors, and new distribution records have been found. Developing new print editions of the flora would not only be horrendously expensive and time-consuming, it is unnecessary. Dr. Zahid Ullah (University of Swat), Sara Wilkinson-Lamb, and I have begun developing a process for developing an updated, online flora. We are starting by reviewing the family limits, aligning them with those of the Angiosperm Phylogeny Group (2016). We then review the names used for the members of one family, consulting current papers, Tropicos (2018+), and TRNS (2018+). nomenclature of the included genera and species. At this point, we review all records for the family in OpenHerbarium and GBIF for Pakistan.
The next step, which we have taken only for the Salicaceae, is to modify, the key to genera, where necessary consulting other floras to determine distinguishing features for the new combinations of genera. The resulting keys with then be made available in Keybase. Keybase is for making dichotomous keys available. It offers them in three different formats: four panels, bracketed, and indented. More interestingly, it enables filtering the leads shown based on a regional checklist. This feature will become useful as reliable checklists for Pakistan’s provinces are developed. Names in KeyBase can also be linked to taxon pages in OpenHerbarium and vice versa.
Creating an online flora for Pakistan, even with the print flora to draw on, will be time-consuming but it would be a magnificent achievement. The first step, however, is to provide free and easy access to the rapidly increasing information in Pakistan’s herbaria. Participating in development of OpenHerbarium will aid in achieving that goal while creating valuable resources for a wide range of research, education, and public outreach.
The purpose of this paper is to promote specimen digitization in Pakistan, primarily working with curators of the country’s herbaria. Sharing specimen data and checklists online is not only beneficial to individuals and institutions but also feasible and, increasingly, an expected activity. It will become even easier but is relatively simple even now. Moreover, the cost is minimal. There is no charge for registering with and contributing to OpenHerbarium and those involved in development of Symbiota, the software used by OH, are working to reduce the cost of providing data as well as incorporating new features such as the ability to map the distribution of phylogenetic diversity. I have posted some “how to” documents to the web; more are available at Symbiota.org. If you are interested in becoming a contributor, either as an individual or as the person in charge of a herbarium, please email me. If you are interested in enhancing one aspect of the data in OpenHerbarium, for example, databasing records of medicinal plants from Pakistan, let me know so that I can provide information on what is known about effective practises and suggest how to make the information as useful as possible. Similarly, if there is a new tool you would like to see incorporated, let me know. There is also interest in expanding the pool of potential developers. The software is open source and available on GitHub. If anyone is interested in contributing to its development, they should contact me after familiarizing themselves with the program (see GitHub 2018+). All development is funded by grants, so identification of potential funding sources would be helpful. But even without further development, Symbiota and OpenHerbarium can assist Pakistani botanists, mycologists, and herbaria bring their work to the attention of the world and become familiar with the new ways of thinking associated with living in a digital era.
Barkworth, M.E. 2018a. Creating research checklists in Symbiota.
Barkworth, M.E. 2018b. Creating teaching checklists in a Symbiota network.
Barkworth, M.E. 2018c. Contributing images to a Symbiota network.
Barkworth, M.E. 2018d. A spreadsheet for providing data to OpenHerbarium designed for use in Pakistan.
Creative Commons. 2018+. Clarifying conditions under which works are made publicly available. https://creativecommons.org.
Fazal, H., N. Ahmad, A. Rashid and S. Farooq. 2010. A checklist of Phanerogamic flora of Haripur Hazara, Khyber Pakhtunkhwa, Pakistan. Pak. J. Bot., 42(3): 1511-1522.
Flora of Pakistan. 1970 – present. Currently edited by S.I. Ali and M. Qaiser. University of Karachi, Pakistan.
Gapminder. 2018+. Unveiling the beauty of statistics for a fact-based world view. http://www.gapminder.org.
GitHub. 2018+. The Symbiota Virtual Flora/Fauna project. https://github.com/Symbiota/Symbiota.
Global Biodiversity Information Facility. 2018+. Free and open access to biodiversity data. https://www.gbif.org/.
Gries, C., E. E. Gilbert, and N.M. Franz. 2014. Symbiota – A virtual platform for creating voucher-based biodiversity information communities. Biodiversity Data Journal 2: e1114. doi: 10.3897/BDJ.2.e1114
IPNI. 2018+. International Plant Names Index: database of the names and associated basic bibliographical details of seed plants, ferns and lycophytes. http://www.ipni.org/index.html
KeyBase. 2018+. Teaching old keys new tricks http://keybase.rbg.vic.gov.au.
Khan, A.M., R. Qureshi, M.F. Qaseen, M. Munir, M. Ilyas and Z. Saqib. 2015. Floristic checklist of District Kotli, Azad Jammu & Kashmir. Pakistan Journal of Botany 47: 1957-1968.
McNeill, J., F.R. Barrie, W.R. Buck, V. Demoulin, W. Greuter, D.L. Hawksworth, P.S. Herendeen, S. Knapp, K. Marhold, J. Prado, W.F. Prud’homme Van Reine, G.F. Smith, J.H. Wiersema, N.J. Turland. 2012. International Code of Nomenclature for algae, fungi, and plants (Melbourne Code). http://www.iapt-taxon.org/nomen/main.php.
OpenHerbarium. 2018+. A symbiota-based network for herbaria without a national herbarium network. http://openherbarium.org
Schmid, R. 1972. A resolution of the Eugenia-Syzygium controversy (Myrtaceae). American Journal of Botany 59:423-436.
SEINet. 2018. Download of data for Balsamorhiza sagittata and Wyethia amplexicaulis from SEINet made on 3 Feb 2018. Available upon request.
Symbiota. 2018+. Symbiota: promoting biocollaboration. http://symbiota.org/docs/.
The Angiosperm Phylogeny Group. 2016. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Botanical Journal of the Linnean Society, 2016, 181, 1–20.
Thiers, B. [2018+]. Index Herbariorum: A global directory of public herbaria and associated staff. New York Botanical Garden’s Virtual Herbarium. http://sweetgum.nybg.org/science/ih/.
TRNS. 2018+. Taxonomic Name Resolution Service: a free utility for correcting and standardizing plant names. http://tnrs.iplantcollaborative.org/
Tropicos. 2018+. Tropicos® was originally created for internal research but has since been made available to the world’s scientific community. http://www.tropicos.org.