Assessing herbarium material with novel molecular techniques reveals a wealth of new data from old treasure troves

Using large-scale genome skimming to build a resilient resource for the future
Hamersley Range, Pilbara, Western Australia PHOTO CREDIT: Stephen van Leeuwen

Herbaria are valuable sources of extensive curated plant material that are important reference specimens for plant identification. These plant materials are now also accessible to genetic studies because of advances in high-throughput, next-generation sequencing (NGS) methods.

In our study, we conducted a large-scale applied assessment of one such NGS approach – genome skimming – and its ability to recover plastid and ribosomal genome sequences from a broad taxonomic sampling of herbarium material for the Western Australian flora. We sequenced 672 samples covering 21 families, 142 genera, and 530 named and proposed named species, and explored the impact of sample age, DNA concentration and quality, read depth and fragment length on plastid assembly error.

We demonstrate that herbaria are a valuable source of plant material for building a comprehensive DNA sequence database which serves various applications from modernizing plant surveys to improving the resolution of plant phylogenies.

Gastrolobium grandiflorum, Pilbara, Western Australia PHOTO CREDIT: Stephen van Leeuwen

Genome skimming1 was effective at producing genomic information at large scale. Substantial sequence information on the chloroplast genome was obtained from 96.1% of samples, and complete or near-complete sequences of the nuclear ribosomal RNA gene repeat were obtained from 93.3% of samples.

Eucalyptus kingsmillii, Pilbara, Western Australia PHOTO CREDIT: Stephen van Leeuwen
Grevillea wickhamii, Pilbara, Western Australia PHOTO CREDIT: Stephen van Leeuwen

We extracted sequences for plastid markers rbcL and matK – the core DNA barcode regions – from 96.4% and 93.3% of samples, respectively. Read quality and DNA fragment length had significant effects on sequencing outcomes and error correction of reads proved essential. Assembly problems were specific to certain taxa with low GC and high repeat content (e.g. Goodenia, Scaevola, Cyperus, Bulbostylis, Fimbristylis), suggesting the influence of biological rather than technical factors. The structure of related genomes was needed to guide the assembly of repeats that exceeded the read length. DNA-based matching proved highly effective and showed that the efficacy for species identification declined in the following order: total chloroplast DNA >> ribosomal DNA > matK >> rbcL.

Ptilotus rotundifolius, Pilbara, Western Australia PHOTO CREDIT: Stephen van Leeuwen

Our success is important as it demonstrates that herbaria can be used as a source of plant material for building a comprehensive DNA sequence database. These data form the basis of development of a molecular identification system for the Western Australian flora. This will enable identification of specimens throughout the year (e.g., non-flowering times) and for hard-to-identify species (e.g., those with constrained or reduced morphological characters) or for specimens where only fragments of non-diagnostic material are available. The availability of this technology will modernize plant surveys by reducing constraints on survey effort through moderating sampling timing restrictions and seasonal effects, as well as enabling rapid verifiable identification. It will also have practical applications in a wide range of ecological contexts using eDNA metabarcoding, such as gut and scat analysis of animals to determine dietary preferences of threatened species and livestock, and checking the integrity of seed collections for seed banking and use in land restoration/revegetation programs. Other potential uses of extensive plastid sequence data, beyond species identification, include improving the resolution of plant phylogenies and studies on the evolution of plastid genome function, including understanding adaptive changes.


1. Straub S, Parks M, Weitemier K, Fishbein M, Cronn R, Liston A (2012) Navigating the tip of the genomic iceberg: Next-generation sequencing for plant systematics. American Journal of Botany 99(2), 349-364.

For full details, please refer to the publication in BMC Plant Methods.

Written by

Paul Nevill

Paul Nevill

Curtin University, School of Molecular and Life Sciences, ARC Centre for Mine Site Restoration

& Trace and Environmental DNA (TrEnD) Lab, Perth, Western Australia

February 4, 2020 

Don't Miss Out!

Subscribe to the iBOL Barcode Bulletin for updates on DNA barcoding efforts, the iBOL Consortium, and more.

comment on this article

The Barcode Bulletin moderates comments to promote an informed and courteous conversation. Abusive, profane, self-promotional, or incoherent comments will be rejected. 


Submit a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share This