Improved identification of XML: a Python experiment

11 July 2011

As a part of the SCAPE project, I’m currently heavily involved in the evaluation of various file format identification tools. The overall aim of this work is to determine which tools are suitable candidates for inclusion in the SCAPE architecture. In addition, we’re also trying to get a better idea of each tool’s specific strengths and weaknesses, which will hopefully serve as useful input to the developers community. We’re actually planning to publish the first results of this work on the OPF blog some time soon, so you may want to keep your eyes peeled for that.


Paper on JPEG 2000 for preservation

06 June 2011

The JPEG 2000 compression standard is steadily becoming more and more popular in the archival community. Several large (national) libraries are now using the JP2 format (which corresponds to Part 1 of the standard) as the master format in mass digitisation projects. However, some aspects of the JP2 file format are defined in ways that are open to multiple interpretations. This applies to the embedding of ICC profiles (which are used to define colour space information), and the definition of grid resolution. This situation has lead to a number of interoperability issues that are potential risks for long-term preservation.


Ensuring the suitability of JPEG 2000 for preservation

02 December 2010

In my presentation during the Wellcome Trust’s JPEG 2000 seminar I discussed the suitability of JPEG 2000 (and more specifically its JP2 format) for long-term preservation. I highlighted the erroneous restriction in the JP2 (and JPX) format specification that only allows ICC profiles of the ‘input’ class to be used. This effectively prohibits the use of all working colour spaces such as Adobe RGB, which are defined using ‘display device’ profiles. I also showed how different software vendors interpret the format specification in subtly different ways, and how such issues can create problems in the long term, such as the loss of colour space and resolution information after some future migration.