A prototype JP2 validator and properties extractor 14 Dec 2011

A few months ago I wrote a blog post on a simple JP2 file structure checker. This led to some interesting online discussions on JP2 validation. Some people asked me about the feasibility of expanding the tool to a full-fledged JP2 validator. Despite some initial reservations, I eventually decided to dedicate a couple of weeks to writing a rough prototype. The first results of this work are now ready in the form of the jpylyzer tool. Although I initially intended to limit its functionality to validation (i.e. verification against the format specifications), I quickly realised that since validation would require the tool to extract and verify all header properties anyway, it would make little sense not to include this information in its output. As a result, jpylyzer is both a validator and a properties extractor.

More ...

Evaluation of identification tools: first results from SCAPE 21 Sep 2011

As I already briefly mentioned in a previous blog post, one of the objectives of the SCAPE project is to develop an architecture that will enable large scale characterisation of digital file objects. As a first step, we are evaluating existing characterisation tools. The overall aim of this work is twofold. First, we want to establish which tools are suitable candidates for inclusion in the SCAPE architecture. As the enhancement of existing tools is another goal of SCAPE, the evaluation is also aimed at getting a better idea of the specific strengths and weaknesses of each individual tool. The outcome of this will be helpful for deciding what modifications and improvements are needed. Also, many of these tools are widely used outside of the SCAPE project, which means that the results will most likely be relevant to a wider audience (including the original tool developers).

More ...

A simple JP2 file structure checker 01 Sep 2011

Over the last few weeks I’ve been working on the design of a workflow that the KB is planning to use for the migration of a collection of (mostly old) TIFF images to JP2. One major risk of such a migration is that hardware failures during the migration process may result in corrupted images. For instance, one could imagine a brief network or power interruption that occurs while an image is being written to disk. In that case data may be missing from the written file. Ideally we would be able to detect such errors using format validation tools such as JHOVE. Some time ago Paul Wheatley reported that the BL at some point were dealing with corrupted, incomplete JP2 files that were nevertheless deemed “well-formed and valid” by JHOVE. So I started doing some experiments in which I deliberately butchered up some images, and subsequently checked to what extent existing tools would detect this.

More ...

Improved identification of XML: a Python experiment 11 Jul 2011

As a part of the SCAPE project, I’m currently heavily involved in the evaluation of various file format identification tools. The overall aim of this work is to determine which tools are suitable candidates for inclusion in the SCAPE architecture. In addition, we’re also trying to get a better idea of each tool’s specific strengths and weaknesses, which will hopefully serve as useful input to the developers community. We’re actually planning to publish the first results of this work on the OPF blog some time soon, so you may want to keep your eyes peeled for that.

More ...

Paper on JPEG 2000 for preservation 06 Jun 2011

The JPEG 2000 compression standard is steadily becoming more and more popular in the archival community. Several large (national) libraries are now using the JP2 format (which corresponds to Part 1 of the standard) as the master format in mass digitisation projects. However, some aspects of the JP2 file format are defined in ways that are open to multiple interpretations. This applies to the embedding of ICC profiles (which are used to define colour space information), and the definition of grid resolution. This situation has lead to a number of interoperability issues that are potential risks for long-term preservation.

More ...