Difference between revisions of "Persistent Identifiers"

From SPNHC Wiki
Jump to: navigation, search
(Statement of Purpose)
Line 1: Line 1:
 
==Statement of Purpose==
 
==Statement of Purpose==
 +
This page was originally created to expand the scope of content from the [[Numbering_Natural_History_Collections#Persistent_Identifiers|section on Persistent Identifiers as they relate to Numbering Natural History Collections]].
  
 
==Introduction==
 
==Introduction==

Revision as of 19:38, 16 November 2022

Statement of Purpose

This page was originally created to expand the scope of content from the section on Persistent Identifiers as they relate to Numbering Natural History Collections.

Introduction

As we increasingly digitize specimens and share digital data about our collections, Persistent IDentifiers (PIDs) are often assigned to digital specimen records and can also be used to reference other elements of collections data, such as people or taxa. PIDs are foundational elements of data infrastructure because they enable automated and semi-automated linking between concepts[1], and also help make data FAIR, or "findable, accessible, interoperable and reusable" (see FAIR Guiding Principles F1 and A1[2]).

What is a Persistent Identifier?

When we talk about Persistent Identifiers (PIDs) we assume that they are:

  1. Unique. Unlike a catalog number, which may be locally unique, a PID must be unique on the global scale in order to ensure that the object it identifies can be unambiguously referenced. The need for uniqueness means that PIDs must be generated programmatically rather than created by human logic.
  2. Persistent. Once assigned to an object, a PID should never change. PIDs also should not be deleted or reassigned, although in some circumstances a PID may refer to an object that no longer exists. "Never" is still relative; the current systems we use to manage PIDs are expected to have a lifespan of anywhere from decades to centuries.
  3. Computer readable. PIDs are designed primarily for use by computers, not humans, although some PID schemes do have components that are meaningful to humans.
  4. Resolvable. Generally, we expect that a PID can be reliably resolved to a meaningful information about the identified object, e.g., the PID "http://n2t.net/ark:/65665/3af2b96d2-a8a1-47c5-9895-b0af03b21674" is an actionable URL that redirects to a specimen record on an institutional web portal.

A Universally Unique IDentifier (UUID)–more generically known as a Globally Unique IDentifier (GUID)–is an identifier that is unique, persistent, and computer readable but not resolvable. For example, "3af2b96d2-a8a1-47c5-9895-b0af03b21674" is a UUID and "http://n2t.net/ark:/65665/3af2b96d2-a8a1-47c5-9895-b0af03b21674" is a PID. UUIDs/GUIDs have been widely used by natural history collections to identify digital specimen records in the Darwin Core term dwc:occurrenceID because of how easy they are to acquire via online tools such as https://www.uuidgenerator.net. Although UUIDs/GUIDs will continue to be used and can persist as useful identifiers, the community is moving towards a preference for using true PIDs to reference digital objects such as specimen records. See below for a list of commonly used PID formats and their applications in our domain.

Truly reliable, long term resolvability can be a difficult quality to achieve. Registration agencies are the social infrastructure that govern and maintain resolvability for various PID schemes. For example, DataCite is a registration agency that mints Digital Object Identifiers (DOIs). See Hardisty et al. 2021[3] for a thorough discussion on what resolvability means and an example of how the European DiSSCo project evaluated PID options for use by its member institutions.

Types of Identifiers

PIDs (and other identifiers) can be assigned to different types of objects within the realm of natural history collections. Although we often think of them in relation to digital specimen records (see Numbering Natural History Collections), PIDs are also useful when assigned to people, organizations, taxonomic concepts and names, geographic places, etc. See the table below for examples of what types of identifiers are most commonly used where.

Contributors

Erica Krimmel

Links

References

  1. Meadows A, Haak LL, Brown J. 2019. Persistent Identifiers: The Building Blocks of the Research Information Infrastructure. Insights 32(1): 9. http://doi.org/10.1629/uksg.457
  2. Wilkinson M, Dumontier M, Aalbersberg I et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018. https://doi.org/10.1038/sdata.2016.18
  3. Hardisty A, Addink W, Glöckler F, Güntsch A, Islam S, Weiland C. 2021. A choice of persistent identifier schemes for the Distributed System of Scientific Collections (DiSSCo). Research Ideas and Outcomes 7: e67379. https://doi.org/10.3897/rio.7.e67379.