- A Framework of Guidance for Building Good Digital Collections
Objects Principle 5
Objects Principle 5: A good object can be authenticated.
Authenticity refers to the degree of confidence a user can have in the integrity and trustworthiness of an object. Authentication is the act of determining that the object conforms to its documented origin, structure, and history, and that the object has not been corrupted or changed in an unauthorized way.
It is important to note that authenticity does not refer to the accuracy of the content or meaning of the object. As Clifford Lynch noted, "An authentic document may faithfully transmit complete falsehoods." (Authenticity and Integrity in the Digital Environment, http://www.clir.org/pubs/reports/pub92/lynch.html) Nonetheless, research and scholarship rely upon the ability to verify the authenticity of materials in order to use them appropriately. For archives, authenticity is an important component of the evidentiary value of records and has legal significance.
In the non-digital realm, the authenticity of documents is often determined through forensics such as paleography, examination of physical characteristics, and comparison of handwritten signatures. For digital objects, such physical clues do not exist, and the importance of documentation increases proportionately. The user wants to know the origin of the digital object, whether or not the object has been altered since its creation, and if so, how and by whom. Some methods of providing this information include documentation of digital provenance, watermarking, and fixity checking.
The digital provenance of an object is its origin and change history, which can be recorded as metadata (see METADATA). Origin information can be provided internally, often in the file header. Change history is most often recorded externally. The METS schema defines a placeholder section (digiProvMD) for digital provenance, but does not define any metadata elements to use within it. However, the Event Entity in the PREMIS Data Dictionary for Preservation Metadata defines semantic units that document digital provenance (http://www.oclc.org/research/projects/pmwg/premis-final.pdf). An XML schema for the PREMIS event entity can be used as a METS extension schema under digiprovMD (http://www.loc.gov/standards/premis/schemas.html).
Digital watermarking is a technique for adding a visible or invisible message to an object. Digital watermarks are most often used to assert copyright or ownership. Although watermarks may provide useful information similar to embedded origin information, they should be viewed cautiously as documentation of authenticity. (See Clifford Lynch, Authenticity and Integrity in the Digital Environment: An Exploratory Analysis of the Central Role of Trust http://www.clir.org/pubs/reports/pub92/lynch.html.)
The fixity of an object can be verified by comparison of message digests (often called checksums) generated from the object at different points in time. A message digest is a string created by applying an algorithm called a “hash function” to the bits comprising the object. The message digest is saved and compared to a message digest created by the same algorithm at a later date. If they are the same, the object is bit-wise unchanged.
Context can also provide clues to authenticity. A good object will be related to other versions of the object, to other objects within a collection, and to host objects and/or contained objects. The archival profession has done both theoretical and practical work in preserving context and original order in the digital environment.
- CLIR, Authenticity in a Digital Environment (2000) http://www.clir.org/pubs/reports/pub92/contents.html. Although getting dated, some of the essays in this compilation are still among the best on the topic, particularly Clifford Lynch’s.
- DigiCULT, Integrity and Authenticity of Digital Cultural Heritage Objects (2002) http://www.digicult.info/downloads/thematic_issue_1_final.pdf. DigiCULT monitors, discusses, and analyses the impact of new technology on cultural and scientific heritage organizations. This publication gathers an eclectic but interesting set of primarily European perspectives.
- The Long-term Preservation of Authentic Electronic Records: Findings of the InterPARES Project (2005) http://www.interpares.org/book/index.cfm. This report from the first stage of the international InterPARES project focuses on the preservation of the authenticity of records created and/or maintained in databases and document management systems in the course of administrative activities.
- National Library of Australia, PADI (Preserving Access to Digital Information): Authenticity website http://www.nla.gov.au/padi/topics/4.html. Well-maintained webliography of resources.
About message digests and watermarking:
- Fred Mintzer, Jeffrey Lotspiech, and Norishige Morimoto, Safeguarding Digital Library Contents and Users: Digital Watermarking (1997) http://www.dlib.org/dlib/december97/ibm/12lotspiech.html. A good basic explanation with illustrations.
- Richard Entlich, “A Little Bit’ll Do You (In): Checksums to the Rescue,” RLG DigiNews, v. 9, no. 3 (2005) http://www.rlg.org/en/page.php?Page_ID=20666#article3. General introduction to checksums and message digests.
- Wikipedia, MD5 http://en.wikipedia.org/wiki/MD5. Might be more than you want to know about Message-Digest algorithm 5, but maybe not.
- Audrey Novak, Fixity Checks: Checksums, Message Digests and Digital Signatures (2005) http://www.library.yale.edu/iac/DPC/AN_DPC_FixityChecksFinal11.pdf. Best practices from Yale University Library.
Last updated: 04/17/2008