Apache Hadoop as a Storage Backend for Fedora Commons

Frank Asseg, Matthias Razum, Matthias Hahn:
Apache Hadoop as a Storage Backend for Fedora Commons
In: OR2012, The 7th International Conference on Open Repositories (9-13 July 2012, Edinburgh, UK)

Abstract

Certain types of repositories are constantly growing in size. This is true for archives, national libraries, and research institutions. Research itself is increasingly data-driven (Hey & Trefethen, 2003). This leads to vast amounts of raw and preprocessed data. Web archiving, as done by e.g. the Internet Memory Foundation, requires the ingestion of tens of thousands of files on a daily basis. Aside from the traditional text based publications, there is a trend to archive content like video or audio in a library. This leads to large scale data repositories posing new challenges for digital preservation tasks in terms of performance. An example for a common preservation task is the calculation of check sums on a regular basis for data degradation discovery. Running this task on a petabyte scale video archive can take more time than the interval in between scheduled executions of the task, as defined by a institutions preservation policy. Traditional repository architectures do not meet the requirements for such situations very well.

Download: pdf

3 Responses to Apache Hadoop as a Storage Backend for Fedora Commons

  1. Hi,
    This one is great and is really a good post. I think it will help me a lot in the related stuff and is very much useful for me. Very well written I appreciate & must say good job.

    ——————————————————–

    Hadoop Projects
    ————————-

  2.  sivanagamahesh says:

    Hi,
    Hadoop’s capabilities for distributed computation could prove useful in providing new kinds of digital object services and maintenance for ever increasing amounts of data. We tested storage of Fedora Commons data in the Hadoop Distributed File System (HDFS) using an early development version of Akubra-HDFS interface created by Frank Asset. thanks for sharing.

  3.  sivanagamahesh says:

    Hi,
    Apache Hadoop because of its large and growing community and software ecosystem. Additionally, Hadoop’s capabilities for distributed computation could prove useful in providing new kinds of digital object services and maintenance for ever increasing amounts of data. We tested storage of Fedora Commons data in the Hadoop Distributed File System (HDFS) using an early development version of Akubra-HDFS interface created by Frank Asseg. This article examines the findings of our research study, which evaluated Fedora-Hadoop integration in the areas of performance, ease of access, security, disaster recovery, and costs. the author writing skills are most expensive manner. thanks for sharing.

Leave a Reply