UC Libraries Digital Preservation Program
Preserving UC's Research, Teaching, and Learning
In partnership with the University of California libraries, the California Digital Library established the digital preservation program in 2002. The program ensures long-term access to digital information that supports and results from research, teaching, and learning at UC.
See sections below:
Services
- UC Libraries Digital Preservation Repository: Supports the long-term retention of digital objects for the benefit of the UC libraries and their users.
- CDL Web Archiving Program: Develops tools and standards and engages in research and collaboration to support the preservation of vital web resources. The primary services of this program are:
- The Web Archiving Service: Enables librarians and scholars to capture and analyze web-based content, and to create publicly accessible web archives.
- Web Archives: CDL hosts access to the archives created with the California Digital Library's Web Archiving Service. See The California Digital Library and Web Archiving [PDF] for an overview of the archives.
- The Web-at-Risk: A distributed approach to preserving our political cultural heritage. Funded by a Library of Congress grant and completed in July 2009, this project developed web archiving tools now used by libraries to capture, curate, and preserve collections of web-based government and political information.
Curation Microservices
Developed with CDL programs and partners (e.g., LoC, UMich),
curation micro-services offer an unbundled alternative to
all-in-one repositories that can be expensive to support and
modify (cf DSpace, Fedora, LOCKSS).
Using native operating system file and web services, we define
minimal conventions to turn a file system into an "object
system" and provide low barrier tools for full lifecycle
enrichment (identity, fixity, replication, annotation, etc.) of
objects. For more background see
curation services.
Open specifications and tools.
We welcome feedback on these works in progress.
- Noid (Nice Opaque Identifiers):
Noid provides minting, binding, and resolving services in
support of preservation-ready identifiers. Persistent
identifiers may be obtained by a committed provider with
help from these kinds of identity services. Software:
download.
- Dflat: Simple File-Based Object Storage:
An object residence, or "digital flat". Common amenities,
such as versions, metadata, annotations, administrivia, and
the occupant itself (as intended by the depositor),
if present, are always found under reserved names. We will
likely have "Dflats" at the ends of Pairtree paths.
- Pairtrees for Collection Storage:
A filesystem convention for holding a collection of
digital object directories. The directory path ending at
an object is formed by taking the identifier and making a
sub-directory for each next pair of characters.
Conversely, one can recover every object and its identifier
simply by "walking" the Pairtree. Software:
download.
- Content Access Node (CAN):
A CAN holds a repository instance, which is a set of
collections (Pairtrees) plus policy configuration files to
govern such things as fixity, replication, indexing, and
annotation, depending on the purpose of the repository.
- CLOP: A Class-Based System for Managing Object Properties:
Allows policy declarations to be attached to files,
versions, objects, and entire repositories.
- Directory Typing with Namaste Tags:
Namaste (NAMe AS TExt) tags are primitive directory-level
metadata exposed directly via filenames. As such, they
greet visitors who request a directory listing with a
glimpse of what the directory holds. Alpha software:
download.
- Reverse Directory Deltas (ReDD):
ReDD is a way to represent differences between two sets of
files, which permits great cost reduction when storing
multiple versions. To optimize access to recent versions,
a chain of ReDD "reverse deltas" stretches backward in
time. We will likely use ReDD for Dflat version
directories.
- Checkm: a checksum-based manifest format:
Checkm is a general-purpose text-based manifest format
designed to support tools that verify the bit-level
integrity of file groups for such things as content
fixity, replication, import, and export.
- JHOVE2 Architecture for Format-Aware Characterization:
A next-generation framework and application for
format-aware characterization, building on the succcess of
the original JHOVE
system. JHOVE2 generalizes the process
of characterization to include signature-based
identification, validation, feature extraction, and
policy-based assessment.
- BagIt File Package Format:
A "bag" is a hierarchical file package format suitable for
the exchange of generalized archival content via the
network or hard-disk. It has just enough structure to
safely enclose its payload but does not require the
receiver to have any deep knowledge of its internal
semantics. Software:
download.
- N2T: Name-to-Thing Resolver:
N2T is a centralized, scheme-agnostic identifier resolver to
protect URL stability for organizations with web server
hostnames that might change.
Best Practices and Standards
- Archival Resource Key (ARK):
a naming scheme for preservation-ready identifiers.
[HTML]
- WARC File Format (ISO 28500:2009):
co-authored by CDL preservation staff, this international
standard specifies a structure for storing and exchanging
resources harvested from the web and elsewhere.
[HTML]
Partners
The CDL's preservation partners include the UC campus libraries, and digital library and preservation researchers around the world. Grant funding has been received from the Library of Congress National Digital Information Infrastructure and Preservation Program, The Andrew W. Mellon Foundation, and the Institute of Museum and Library Services.
Reports
- UC libraries digital preservation program: Report on aims, overview, and initial priorities. Spring 2004 [DOC]
- Web-based government information: Evaluating solutions for capture, curation, and preservation. [PDF]
- Preserving digital materials: Final report for the IMLS about creating a preservation repository for multi-institution use. [PDF]
- Systemwide strategic directions for libraries and scholarly information: The UC libraries' strategic plan. June 2004 [PDF]
- Digital preservation flier: Spring 2004 [PDF]