EPUB for archival preservation 18 Jun 2012

Over the last few years, the EPUB format has gained widespread popularity in the consumer market. The KB has been approached by a number of publishers that wish to use EPUB for delivering some of their electronic publications. Surprisingly little information is available on the format’s suitability for archival preservation, apart from Library of Congress’ Sustainability of Digital Formats web pages, which contain entries on EPUB 2 and EPUB 3.

So, the KB’s Departments of Collection and Collection Care requested a more detailed investigation of EPUB’s preservation credentials. More specifically, answers were needed to the following questions:

  • What are the main characteristics of EPUB?

  • What functionality does EPUB provide, and is this sufficient for representing e.g. content with sophisticated layout and typography requirements?

  • How well is the EPUB supported by software tools that are used in (pre-)ingest workflows?

  • How suitable is EPUB for archival preservation? What are the main risks?

Update on jpylyzer 23 Apr 2012


In this blog post I will give a brief update of the latest jpylyzer developments. Jpylyzer is a validation and feature extraction tool for the JP2 (JPEG 2000 Part 1) still image format.

Jpylyzer documentation 10 Jan 2012

This will be my shortest blog post ever. Following up on my previous blog post on a prototype JP2 validator and properties extractor (jpylyzer), there is now a comprehensive User Manual of the tool. Just follow the link below:


Link to jpylyzer home page:


Meanwhile work on jpylyzer remains ongoing, so watch this space for any updates on this.

Update February 2019: updated links in original blog post

Originally published at the Open Preservation Foundation blog

A prototype JP2 validator and properties extractor 14 Dec 2011

A few months ago I wrote a blog post on a simple JP2 file structure checker. This led to some interesting online discussions on JP2 validation. Some people asked me about the feasibility of expanding the tool to a full-fledged JP2 validator. Despite some initial reservations, I eventually decided to dedicate a couple of weeks to writing a rough prototype. The first results of this work are now ready in the form of the jpylyzer tool. Although I initially intended to limit its functionality to validation (i.e. verification against the format specifications), I quickly realised that since validation would require the tool to extract and verify all header properties anyway, it would make little sense not to include this information in its output. As a result, jpylyzer is both a validator and a properties extractor.

Evaluation of identification tools: first results from SCAPE 21 Sep 2011

As I already briefly mentioned in a previous blog post, one of the objectives of the SCAPE project is to develop an architecture that will enable large scale characterisation of digital file objects. As a first step, we are evaluating existing characterisation tools. The overall aim of this work is twofold. First, we want to establish which tools are suitable candidates for inclusion in the SCAPE architecture. As the enhancement of existing tools is another goal of SCAPE, the evaluation is also aimed at getting a better idea of the specific strengths and weaknesses of each individual tool. The outcome of this will be helpful for deciding what modifications and improvements are needed. Also, many of these tools are widely used outside of the SCAPE project, which means that the results will most likely be relevant to a wider audience (including the original tool developers).

