Duplicate Detection for Quality Assurance of Document Image Collections

Reinhold Huber-Mörk, Alexander Schindler, and Sven Schlarb:
Duplicate Detection for Quality Assurance of Document Image Collections.
In: iPRES 2012 – Proceedings of the 9th International Conference on Preservation of Digital Objects. Toronto 2012, 136-143.
ISBN 978-0-9917997-0-1

Abstract
Digital preservation workflows for image collections involving automatic and semi-automatic image acquisition and processing are prone to reduced quality. We present a method for quality assurance of scanned content based on computer vision. A visual dictionary derived from local image descriptors enables efficient perceptual image fingerprinting in order to compare scanned book pages and detect duplicated pages. A spatial verification step involving descriptor matching provides further robustness of the approach. Results for a digitized book collection of approximately 35:000 pages are presented. Duplicated pages are identified with high reliability and well in accordance with results obtained independently by human visual inspection.

Download: pdf

Presentation:

Leave a Reply