A heuristic measure for detecting influence of lossy JP2 compression on OCR in the absence of ground truth

Sven Schlarb, Clemens Neudecker:
A heuristic measure for detecting influence of lossy JP2 compression on OCR in the absence of ground truth
In: Archiving 2012, June 2012, Vol. 8, p. 250-254; ISBN / ISSN: 978-0-89208-300-8

Abstract
Cultural heritage institutions such as libraries, museums, and archives have been carrying out large scale digitisation projects during the last decade, and the question how to store digital master images in a cost effective way made the JPEG 2000 standard (ISO/IEC 15444-1), especially the JP2 image file format (JPEG 2000 Part 1), popular in the library, museums, and archives community. Especially the lossy JP2 encoding of page image masters provides a good balance between file size reduction and preservation of the visible properties of a master image. Lossy JP2 encoding of digital images means that it is not possible to restore the original file at the bit level, even if there are no distinguishable differences to the human eye. But the absence of visual changes does not always imply that there is no influence on the computational processing of the images. In this context we present a heuristic measure that helps to detect undesired influence of lossy JP2 compression on the OCR result, and in the absence of ground truth.

Download: link

Leave a Reply