- A Framework of Guidance for Building Good Digital Collections
Objects Principle 4
Objects Principle 4: A good object will be named with a persistent, globally unique identifier that can be resolved to the current address of the object.
An identifier is a name assigned to an object according to a formal standard, an industry convention, or a local system providing a consistent syntax. Good identifiers will at minimum be locally unique, so that resources within the digital collection or repository can be unambiguously distinguished from each other. Global uniqueness can then be achieved through the addition of a globally unique prefix element, such as a code representing the organization.
Locally unique identifiers should be:
- scalable, so that many identifiers can be assigned without danger of running out or duplication;
- consistent, having a construction that can be easily applied over time;
- actionable, or capable of taking one to the object with a single “click” or action; and
- persistent, such that the identifier does not change when the location of the object changes.
In the best of all possible worlds, locally assigned identifiers would conform to known national or international standards. Unfortunately, most standard identifiers point to classes of objects (e.g., the ISBN, which identifies all books in a particular edition), or can only be assigned by particular agencies, or cost a fee to register. For most digital collections, the object identifiers will have to be assigned locally, according to some local scheme. This is not a problem, so long as the scheme is documented and the documentation is accessible.
It is also possible to incorporate standard identifiers into a local naming scheme. For example, in a digital collection of journal articles, the object identifier could consist of a prefix indicating the institution assigning the identifier followed by the SICI for the article.
There is a longstanding controversy over whether identifiers should be “smart” or “dumb,” that is, whether they should carry meaning or not. We feel that neither method is universal best practice and that applications can have good reason to prefer one or the other.
Actionable identifiers for Internet accessible objects should utilize name resolvers, software that uses a registry to map from the static persistent identifier to the current location of the object. Although the registry must be updated when an object is moved, this degree of indirection facilitates maintenance because the location needs only be updated once in a central spot, no matter how many times the identifier occurs in references. Some identifier schemes utilizing name resolvers include PURLs, handles, and ARKs.
PURLs (Persistent URLs) are URLs resolved to true locations by a PURL server. OCLC runs a central PURL server that anyone can use. Alternatively, any organization can download and install the free PURL server application (http://www.purl.org/) and manage its own PURL server locally.
The Corporation for National Research Initiatives (CNRI) developed the Handle System (http://www.handle.net/), a resolver application for persistent identifiers called “handles.” CNRI maintains a global handle registry as well. Organizations wishing to utilize the Handle System must register a namespace with CNRI. As with the PURL server, organizations have the choice of using the resolver at CNRI together with a local Handle application or running their own Handle application locally. The DOI (Digital Object Identifier) is a proprietary implementation of the Handle System (http://www.doi.org/). Use of DOI requires an annual membership fee to the International DOI Foundation to support maintenance of the DOI registry, metadata, and policy framework. Many commercial and open-source digital repository applications, including DigiTool, Fedora, and DSpace, can use the Handle System for object identification. Many electronic publishers, national libraries, and information consortia use DOI.
The Archival Resource Key (ARK) is a globally unique, actionable identifier scheme developed by the California Digital Library (http://www.cdlib.net/inside/diglib/ark/). CDL also provides an open source utility, NOID, which can be used to generate both ARK and handle identifiers (http://www.cdlib.net/inside/diglib/noid/). NOID can also be set up as a name resolver.
URLs and other Internet identifiers are types of Uniform Resource Identifiers (URI) (http://gbiv.com/protocols/uri/rfc/rfc3986.html). The INFO URI scheme provides a consistent way to represent and reference legacy identifiers so that they can be used by web applications (http://info-uri.info/). Some that have been registered to date include the Library of Congress Control Number (LCCN), PubMed identifier, DDC number, and OCLC WorldCat Control Number. The INFO URI scheme provides a lightweight method of registration that can be used instead of the more formal URN namespace registration process. A small number of legacy identifiers have been registered as URN namespaces, such as ISBN and ISSN.
Two emerging identifier specifications are XRI (eXtensible Resource Identifier) and IRI (Internationalized Resource Identifier). The IRI is a form of URI that supports internationalization by extending the character set to UNICODE characters and allowing up/down and right/left scanning in addition to left/right. The IRI specification is being developed by the W3C (http://www.w3.org/International/O-URL-and-ident.html).
The XRI builds on the IRI to identify resources independent of any specific physical network path, location, or protocol. Interestingly, XRI can be used for people as well as objects, and it can incorporate cross-references, such as an email address or website. The IRI specification is being developed by OASIS (http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xri).
It is important to understand that no identifier scheme or resolver system can guarantee persistence. Regardless of the technology used, for identifiers to remain persistent an institution must take responsibility for both the object and for the maintenance of its identifier. Useful resources on developing an identifier strategy:
- Hans-Werner and Jochen Kothe, Implementing Persistent Identifiers: Overview of Concepts, Guidelines and Recommendations (2006) http://www.knaw.nl/ecpa/publ/pdf/2732.pdf. Written for the European Commission on Preservation and Access, this report explains the principle of persistent identifiers and helps institutions decide which scheme would best fit their needs.
- Harvard University Library Office for Information Systems, Naming and Repository Services: An Introduction http://hul.harvard.edu/ldi/resources/nrsdrsservice.pdf. Includes a gentle explanation of the importance of good practices in the design of naming services.
- IMS Persistent, Location-Independent, Resource Identifier Implementation Handbook (2001) http://imsglobal.org/implementationhandbook/imsrid_handv1p0.html. Using URNs for learning objects.
- International DOI Foundation, DOI Handbook (2006) http://www.doi.org/hb.html. Although all about the DOI, includes general explanations of many practical aspects of naming and name resolution.
Last updated: 04/17/2008