This report describes the year 3 activities of the SCAPE project in the Characterisation Components work package, and presents an evaluation of format identification tools for execution in a parallelised Map Reduce environment. We report two general solutions that complement each other with different pros and cons. We present a solution to remedy the challenge of different tools giving different results on the same data. We discuss the concept of policy driven validation of digital objects according to an institutional preservation policy and gives reference to a concrete proof of concept solution. We present an evaluation of deploying Apache Tika and DROID on the SCAPE Azure platform as an alternative to the general SCAPE Execution Platform. We present the research project in extracting semantic information from web based text corpora and how such a system could be utilised by the digital preservation community.
Upcoming Events
- The SCAPE Project has closed on 2014-09-30. See Past Events above.
OPF Blogs for SCAPE
- Monitoring Disappearing File Formats 5: Applications for disappearing file formats 10/08/2023This article is the fifth in the series on monitoring ageing file formats. The underlying question is: Can we predict which file formats are likely...Kiki
- A Request for Comment: Automatic Digital Preservation and Self-Healing DOIs 07/08/2023We're excited to share the latest blog post by Martin Eve, Principal R&D Developer on the Crossref Labs team, discussing a potential solution for enhancing the digital preservation of scholarly content crucial to maintaining persistent identifier integrity.Rosa Clark
- Monitoring Disappearing File Formats 4: DANS 03/08/2023This article is the fourth in a series on monitoring ageing file formats. Can you predict which file formats are likely to become obsolete? This...Kiki
- Monitoring Disappearing File Formats 3: Sound & Vision 27/07/2023This article is the third in the series on monitoring ageing file formats. Can you predict which file formats are likely to become obsolete? This...Kiki
- Monitoring Disappearing File Formats 2: Common Crawl 20/07/2023The internet as archive, the Bass model in practice: which internet formats are disappearing? Original author: Rein van ‘t Veer Is it possible to predict...Kiki
- Monitoring Disappearing File Formats 5: Applications for disappearing file formats 10/08/2023