23 April 2013
About a year ago, work started on packaging SCAPE tools. Jpylyzer was the first SCAPE tool that was turned into a Debian package. Some time later, the OPF set up a couple of machine images at Amazon Web Services, which can be used to create packages repeatedly using a virtual machine. Even though I’ve used the Amazon service a couple of times myself, I really know next to nothing about Debian packages, and it’s safe to say that the underlying build process has been more or less a complete mystery to me.
To get a better understanding of the process for building Debian packages, I had a try at packaging jpylyzer on my local machine (which runs on Linux Mint 14). Some time ago Dave Tarrant and Rui Castro wrote a nice step-by-step guide on building Debian packages on the OPF Wiki, so I tried to follow the instructions there. While working on this, I made some notes, mainly to remind myself of what I was doing. Then I realised that some of this might be useful to others as well, so I decided to turn it into a blog post.
09 January 2013
The most important new feature of the recently released PDF/A-3 standard is that, unlike PDF/A-2 and PDF/A-1, it allows you to embed any file you like. Whether this is a good thing or not is the subject of some heated on-line discussions. But what do we actually mean by embedded files? As it turns out, the answer to this question isn’t as straightforward as you might think. One of the reasons for this is that in colloquial use we often talk about “embedded files” to describe the inclusion of any “non-text” element in a PDF (e.g. an image, a video or a file attachment). On the other hand, the word “embedded files” in the PDF standards (including PDF/A) refers to something much more specific, which is closely tied to PDF’s internal structure.
19 December 2012
The PDF format contains various features that may make it difficult to
access content that is stored in this format in the long term. Examples
include (but are not limited to):
- Encryption features, which may either restrict some functionality
(copying, printing) or make files inaccessible altogether.
- Multimedia features (embedded multimedia objects may be subject to
- Reliance on external features (e.g. non-embedded fonts, or
references to external documents)
04 September 2012
I’ve already written a number of blog posts on format validation of JP2 files. Format validation is only a one aspect of a quality assessment workflow. Digitisation guidelines typically impose various constraints on the technical characteristics of preservation and access images. For example, they may state that a preservation master must be losslessly compressed, and that its progression order must be RPCL. A format profile is a set of such technical constraints. The process that compares the technical characteristics of a file against a format profile is sometimes called Policy Driven Validation. This corresponds to what JHOVE2 refers to as Assessment (which I think is a better description).
This blog post describes a simple method for doing a rule-based assessment of JP2 images. It uses Schematron, which is a rule-based validation language, to ‘validate’ the output of jpylyzer against a profile. Before getting into any technical details, let’s first have a look at an example of a format profile.
09 August 2012
The purpose of this post is to give a brief introduction to creating, editing and submitting format signatures (or ‘magic’ entries) for the well-known File tool. The occasion for this was some work I did last week on improving File’s identification of the JPEG 2000 formats. I had some difficulty finding any easy-to-follow documentation that describes how to do this. The information is all out there, but it’s pretty fragmented. So, I wrote this brief tutorial, which is intended as an accessible introduction to magic editing. It only covers the very basics, but hopefully this is enough to overcome some initial stumbling blocks.