SCAPE’s contribution to long term digital preservation
The volume of digital content worldwide is increasing exponentially. This fact demands that preservation activities become more scalable, while the economics of long-term storage and access demand that they become more automated.
Standard tools are often overtaxed when faced with very large or complex digital objects; the same is true for standard workflows when faced with a very large number of objects or heterogeneous collections. Lack of automated quality assurance tools for detecting and reporting errors in a preservation process increases preservation risks.
The SCAPE project addressed these issues. SCAPE developed solutions for long-term preservation of large-scale and heterogeneous data sets. The resulting SCAPE platform allows the execution of semi-automated workflows which guarantee secure and targeted preservation as well as monitoring of the quality of the results. SCAPE software identifies the need for preservation actions in a repository through characterisation and trend analysis, and responds to such needs on the basis of the institutional policies and generated preservation plans.
These project results were driven by requirements from, and validated through four large-scale Testbeds:
- Web Content
- Digital Repositories
- Research Data Sets
- Data Center
Each Testbed was selected for the unique challenges it highlights.
The Web Content Testbed addressed challenges presented by heterogeneous collections and a rapidly changing delivery environment. The sheer volume of content in web archives requires fully automated, scalable archiving and preservation solutions. Archived web content can include files in a large set of formats in multiple versions, including obsolete ones, and relying on a range of associated rendering tools.
The Digital Repositories Testbed addressed the challenge of carrying out preservation actions within an institutional context with a variety of legal restraints, policy requirements and substantial investments in legacy systems. Preservation challenges in such settings include issues of scalability along all of the dimensions covered in SCAPE dimensions.
The Research Data Sets Testbed was concerned with the pressing need for preservation of long-term access and usability of scientific data. Particular aspects of this testbed include potentially very large data sets, a wide variety of practices and the unique requirement of preserving the original context of the experiment which generated the data in the first place.
The Data Center Testbed demonstrated the applicability of SCAPE solutions for academic and national data centers allowing them to provide scalable preservation services to their user communities, which are typically not primarily focused around digital long-term preservation.