This report describes the year 3 activities of the SCAPE project in the Characterisation Components work package, and presents an evaluation of format identification tools for execution in a parallelised Map Reduce environment. We report two general solutions that complement each other with different pros and cons. We present a solution to remedy the challenge of different tools giving different results on the same data. We discuss the concept of policy driven validation of digital objects according to an institutional preservation policy and gives reference to a concrete proof of concept solution. We present an evaluation of deploying Apache Tika and DROID on the SCAPE Azure platform as an alternative to the general SCAPE Execution Platform. We present the research project in extracting semantic information from web based text corpora and how such a system could be utilised by the digital preservation community.
Upcoming Events
- The SCAPE Project has closed on 2014-09-30. See Past Events above.
OPF Blogs for SCAPE
- The Open Preservation Foundation Advisory Group, July 2022 30/06/2022OPF would like to thank our members for their attendance and participation in the OAG. We hope to see and meet you all at future...Georgia Moppett
- Meet Stephen Abrams 01/04/2022For our Spring newsletter, we spoke to Stephen Abrams from Harvard Library. Tell us a bit about yourself and your role I came to Harvard...Charlotte Armstrong
- Using a custom Wikibase with Siegfried 28/02/2022One of the more advanced parts of the December presentation with myself and Kat Thornton at Yale University Library – Working with Siegfried, Wikidata, and...Ross Spencer
- Scanned vs native PDFs, how to differentiate them ? 11/02/2022With the arrival of the new law for the legal deposit of the digital material, the library is receiving always more documents in PDF format....Thomas Ledoux
- PDF Validation with ExifTool – quick and not so dirty 04/02/2022How is ExifTool dealing with PDF validation?Yvonne Tunnat
- The Open Preservation Foundation Advisory Group, July 2022 30/06/2022