Numbering Natural History Collections

From SPNHC Wiki
Revision as of 22:12, 15 November 2022 by Erica Krimmel (Talk | contribs) (References)

Jump to: navigation, search

Statement of Purpose

These links and documents contain information about numbering systems for natural history collections. This content was generated after several email threads on Nhcoll-l indicating it is a common question for institutions.

Contributors

Emily Braker, Genevieve Tocci, Erica Krimmel

Introduction

Numbering systems provide a basic structure within natural history collections to aid in specimen finding, organization, documentation, inventory, and citation. While numbering systems vary widely by institution, their fundamental purpose remains the same: to assign a unique identifier to individual specimens or specimen lots. In the case of catalog numbers, a unique identifier allows a given specimen/lot to be distinguished from others that may share the same taxonomic name, collecting event, locality, or preparation type, and serves as the primary means by which a cataloged item is referenced. This is different from field number or collector numbers assigned by researchers that can also help distinguish an object, but would be the same for any duplicates across different institutions. As digital data related to natural history collections is increasingly mobilized online, numbering systems that are globally unique have become an essential complement to and/or component of catalog numbers.

Catalog Numbers

Best Practices

Although many numbering systems exist across institutions, it is generally recommended that catalog numbers be unique, sequential, and avoid prefixes/suffixes when possible (see "Multiple numbers" thread, [Nhcoll-l], 2016) [1]. Such a system creates a one-to-many relationship between a voucher specimen and its derivative preparations, promoting their continued association over time. For example, a single catalog number "3222" applied to a study skin, skeleton, and muscle tissue obtained from the same mammal succinctly documents that all "parts" originated from one individual and that these items can be intuitively assembled despite their potentially separate storage locations and/or storage media. A single number series ensures that a unique identifier is applied to each specimen/lot, reducing opportunities for duplication errors and tracking issues.

Some institutions utilize a near-single number series by adding a prefix or suffix to a catalog number to indicate sub-parts for each cataloged item (e.g., 3222.1, 3222.2, 3222.3 corresponding to a plant and its two frozen tissue samples, or S-3222, OS-3222 for a skin and its associated skull). While it is clear that preparations sharing the same primary integer are related, this system creates needless redundancy and can become cumbersome when multiple derivatives exist. In a database environment, prefixed numbers are treated as unique alphanumeric strings and require extra steps to cross-reference and link separate parts or create parent/child records pertaining to the voucher specimen. However, a prefix is useful and sometimes necessary for systematic collections housed within the same department in order to distinguish overlapping numbers from multiple ledgers series (e.g., Mala-1234, Ent-1234, Herp-1234).

This is different conceptually from using barcodes as a way to identify materials (see Barcodes for more information). There is also not a consensus within the community on whether a lot or multi-part collection should be assigned a single catalog number or separate numbers. Some of this challenge relates to the difference in dealing with physical objects and digital records and how a system may link information as opposed to how to determine if things are part of the same material when dealing with physical objects.

Institution codes are historically considered integral to catalog numbers, and should always appear coupled with catalog numbers when referencing the specimen in citations, figures, publications, web content, and presentations (e.g., UCM 12341, MCZ 3439, AMNH 32076). Institution codes tend to be 3-4 characters in length. See the Global Registry of Scientific Collections and Index Herbariorum for a list of institution codes worldwide.

The Darwin Core term dwc:catalogNumber can be used to share information about the catalog number assigned to a specimen.

Legacy Numbering Systems

Many museums have inherited a multiple number series system, which is typically an artifact of institutional reorganization, collections or tracking systems that have fallen obsolete, maintaining separate ledgers for different preparation types (e.g., study skins and corresponding osteological material, or wet vs. dry mollusks), and the creation of subsidiary collections at different points throughout time (e.g., tissues or teaching/educational specimens). Multiple number series rely on a many-to-one model, where parts belonging to the same specimen are assigned different catalog numbers based on the next-available number maintained in multiple ledgers (e.g., a bird skin that is prepared may be assigned "55,001" in the main catalog series, "23,342" for its partial skeleton in the osteology catalog series, and "5499" for muscle tissue in the cryogenic catalog). Additional effort is needed to document and retain the association as well as consultation of a ledger or database to relate all three parts. Under this system, unrelated specimens may share the same number (e.g., spread wing, skeleton, and tissue "5440" may represent 3 different species that do not share the same collecting event), which can result in increased duplication and administrative complexity.

Some institutions may wish to retire multiple numbering systems altogether, or select a specified point in the series after which all specimens will be cataloged under a unified system. Other attempts to rectify a complicated legacy system include recataloging specimens into a single number series. However, it is important to be aware that legacy numbering systems may have been used in research publications, and additionally, recataloging may be unfeasible for larger collections. More often, collections professionals rely on integrated data management systems to resolve varied numbering system issues. Databases can allow for parent-child records that document a voucher specimen and its associated preparations with sub-numbers or unrelated identifiers, and can cross-link associated records. Relational databases are ideal for resolving numbering issues as this format streamlines navigation between specimen records in contrast to a flattened data system.

The Darwin Core term dwc:otherCatalogNumbers can be used to share information about legacy numbers or other related numbers when trying to manage these legacy systems.

Lot-based Systems

Lots differ from individually cataloged objects in that a single number is applied to all specimens (i.e., individual organisms) contained within a given lot. Lots may include mixed taxa or a single taxon, but all specimens comprising the lot share the same collecting event data. If lot "40,986" contains 12 fish and one is reidentified, elevated to a type specimen, or somehow distinguished from the remaining 11 specimens (via imaging, genetic sampling, etc.), it is necessary to differentiate this specimen from others in the lot so that the individual is findable. For imaged or genetically sampled specimens, this may involve simply marking the specimen in some way or adding information to the label. Types and redeterminations typically involve removal and recataloging of the specimen by assigning a unique number while retaining the original lot number "40,986" in the specimen record and on the label so that a permanent association is maintained with the specimens from the original collecting event unit. If possible, the new catalog number should be unique and avoid a prefix or suffix for reasons discussed above.

As new digital options become available there may be additional ways to track parts of lots that are removed and given new numbers, or not given new numbers but elevated to a different status. This can be especially challenging for mixed collections that are not qualified as lots (especially in botanical mixed collections). Currently there is not a consensus on how this is handled, especially for digital data.

Numbers for Specimens Prior to Accession

It is becoming more common for researchers to request numbers for their work or a publication before the material has been officially deposited in the institution it is destined for (see email "Catalogue number requests - paleontology" thread from Nhcoll-l.) While there are different approaches taken across paleontology, and other areas of natural history, generally the consensus is that the specimen/lot should be deposited, accessioned, and assigned a number and numbers should not be given to material still in the hands of the researcher.

Numbers recorded by a researcher or collector, sometimes referred to as field numbers, can be shared using the Darwin Core term dwc:recordNumber.

Accession Numbers

Accessioning is the formal process used to accept and record a specimen as a collection object within an institution. Accessions document the acquisition of the specimen(s) by sanctioned collecting permits or via transfer of title from a donor (individual or organization) and serve as a record of provenance and legal ownership for these objects.

An accession number is assigned to each specimen or group of specimens as part of the accession process. This number is used to relate all documentation and other records to that accession. The accession number also is included in the catalog entry for each specimen in the accession. All correspondence relating to an accession must be identified by the accession number. Accession numbers should be unique and often follow one of two formats:

  • Serial based on the next available number: e.g., 9637, 9638, 9639, 9640…
  • Timestamped running number series: e.g., 2017.1, 2017.2, 2017.3, 2017.4…

Both systems are simple, though the obvious benefit to the timestamped series is that it is a quick and easy accounting of how many accession transactions occurred within a given calendar year. Once formally accessioned, specimens are assigned a catalog number and processed and curated as needed before being stored in the permanent collection.

Not all institutions currently assign individual accession numbers to each incoming specimen or lot. This may not be considered a best practice and please also refer to Nagoya Protocol Information, relevant information at the Convention on Biodiversity website, and future wiki pages or additional resources.

Persistent Identifiers

As we increasingly digitize specimens and share digital data about collection objects, Persistent IDentifiers (PIDs) are a complement to traditional catalog and accession numbers, and are especially useful when assigned to digital specimen records. In addition to being persistent, PIDs are globally unique, computer readable, and ideally resolvable.

In the early 2000s, there was initial enthusiasm in the natural history collections community to construct PIDs out of existing catalog numbers by using a format known as the Darwin Core Triplet, which consists of "institutionCode":"collectionCode":"catalogNumber" (e.g., "USNM:Pal:106473"). Although the Darwin Core Triplet format remains popular because of how simple it is to create and easy it is for humans to interpret, it is no longer recommended because it is not dependably unique and is often implemented idiosyncratically[1].

Today, there are a variety of formats to choose from when assigning PIDs in your collection, about which you can learn more on the Persistent Identifiers wiki page. One of the easiest identifier formats to use is a Universally Unique IDentifier (UUID)–more generically known as a Globally Unique IDentifier (GUID)–which is a string of 32 hexadecimal digits displayed in five groups separated by hyphens, e.g. "123e4567-e89b-12d3-a456-426614174000." Anyone can acquire UUIDs on demand, for free from online tools such as https://www.uuidgenerator.net. Note that although a UUID is globally unique, it is not resolvable. If you are able to, it is recommended to use a resolvable PID format such as a Digital Object Identifier (DOI) or Archival Resource Key identifier (ARK).

It is best practice for the institution tasked with physical care of a collection object to assign a PID (or UUID/GUID) to the object. Be sure to consider what the identifier is actually representing: Is it the physical object? The digital catalog record associated with the physical object? The physical object, its digital catalog record, and any derivative media such as photographs of a specimen? What if the physical object is subsampled, e.g., a bird specimen has material taken for genetic sequencing..? There is no clear community consensus on how to assign PIDs, although the European DiSCCo project has determined that in their context PIDs will be assigned to identify the "digital representations of physical specimens"[2]. Whatever you decide, it is important to apply identifiers consistently within your own institution, and to document your decision about what the identifier represents.

The Darwin Core term dwc:occurrenceID is most often used to share PIDs, and is often required by data aggregators such as the Global Biodiversity Information Facility. Ongoing work in the TDWG Material Sample Task Group may affect how we use the term dwc:occurrenceID in the near future.

Barcodes

Applying barcodes to specimens or locations, in a 1D, 2D, or QR format is becoming widespread across more collections. There are many approaches to this in their physical application as well as how they are managed digitally. See iDigBio for more information on this.

https://www.idigbio.org/wiki/index.php/Specimen_Barcode_and_Labeling_Guide

Links

nhcoll-l Multiple Numbers thread: http://mailman.yale.edu/pipermail/nhcoll-l/2016-March/009178.html

References

  1. Guralnick R, Conlin T, Deck J, Stucky BJ, Cellinese N. 2014. The Trouble with Triplets in Biodiversity Informatics: A Data-Driven Case against Current Identifier Practices. PLoS ONE 9(12): e114069. https://doi.org/10.1371/journal.pone.0114069
  2. Hardisty A, Addink W, Glöckler F, Güntsch A, Islam S, Weiland C. 2021. A choice of persistent identifier schemes for the Distributed System of Scientific Collections (DiSSCo). Research Ideas and Outcomes 7: e67379. https://doi.org/10.3897/rio.7.e67379.