Automated assessment of JP2 against a technical profile

04 September 2012

I’ve already written a number of blog posts on format validation of JP2 files. Format validation is only a one aspect of a quality assessment workflow. Digitisation guidelines typically impose various constraints on the technical characteristics of preservation and access images. For example, they may state that a preservation master must be losslessly compressed, and that its progression order must be RPCL. A format profile is a set of such technical constraints. The process that compares the technical characteristics of a file against a format profile is sometimes called Policy Driven Validation. This corresponds to what JHOVE2 refers to as Assessment (which I think is a better description).

This blog post describes a simple method for doing a rule-based assessment of JP2 images. It uses Schematron, which is a rule-based validation language, to ‘validate’ the output of jpylyzer against a profile. Before getting into any technical details, let’s first have a look at an example of a format profile.

Example format profile

The table below shows the format profile that we’ll be using throughout this blog post, which is a typical ‘access’-oriented profile using lossy compression. Note that it is provided here for illustrative purposes only!

Parameter Value
File format JP2 (JPEG 2000 Part 1)
Compression type Lossy (irreversible 9-7 wavelet filter)
Colour transform Yes (only for colour images)
Number of decomposition levels 5
Progression order RPCL
Tile size 1024 x 1024
Code block size 64 x 64
Precinct size 256 x 256 for 2 highest resolution levels; 128 x 128 for remaining resolution levels
Number of quality layers 8
Target compression ratio 20:1
Error resilience Start-of-packet headers; end-of-packet headers; segmentation symbols
Grid resolution Stored in “Capture Resolution” fields
ICC profiles Embedded using “Restricted ICC” method
Capture metadata Embedded in XML box

Corresponding properties in jpylyzer output

Jpylyzer provides information on all of the technical characteristics that are listed in the table. You can check this yourself by running jpylyzer on any JP2 file and looking at the resulting output. A few examples:

  • Compression type - value of transformation field:

    /jpylyzer/properties/contiguousCodestreamBox/cod/transformation
    
  • Progression order - value of order field:

    /jpylyzer/properties/contiguousCodestreamBox/cod/order
    
  • ICC profiles - value of meth field:

    /jpylyzer/properties/jp2HeaderBox/colourSpecificationBox/meth
    
  • Grid resolution - presence of captureResolutionBox element:

    /jpylyzer/properties/jp2HeaderBox/resolutionBox
    

Expressing the profile as a set of assessable rules

In order to assess jpylyzer’s output against the profile, we first need to translate the profile to a set of assessable rules. This is where Schematron comes in. Look at parameter ‘Compression type’ in the table. In the previous section we saw that it corresponds to the transformation field in jpylyzer’s output. Below is a Schematron rule that asserts if transformation has the required value:

<s:rule context="/jpylyzer/properties/contiguousCodestreamBox/cod">
<s:assert test="transformation = '9-7 irreversible'">wrong transformation</s:assert>
</s:rule>

In words, the rule asserts that the value of transformation (which is located in /jpylyzer/properties/contiguousCodestreamBox/) equals 9-7 irreversible. If the rule fails, this will result in the error message “wrong transformation”.

Both the location (context) and the test statement are expressed using XPath syntax, which allows more complex tests as well.

Check that value doesn’t exceed threshold

The following rule checks if the compression ratio doesn’t exceed a threshold value (this is actually a bit tricky, as for images that don’t contain much information very high compression ratios may be obtained without losing quality):

<s:rule context="/jpylyzer/properties">
<s:assert test="compressionRatio &lt; 35">Too much compression</s:assert>
</s:rule>

(Note that the character reference “&lt” represents “<”, which isn’t allowed in XML.)

Check if element exists

The following Schematron rule checks if the captureResolutionBox element exists:

<s:rule context="/jpylyzer/properties/jp2HeaderBox/resolutionBox">
<s:assert test="captureResolutionBox">no capture resolution box</s:assert>
</s:rule>

Outcome depends on values of multiple elements

Here’s a more complex rule that checks whether a colour transformation (multipleComponentTransformation) was used while creating the image. A colour transformation is only possible for colour images, so in order to make this work for grayscale images as well, the rule must take into account that multipleComponentTransformation will be ‘no’ in that case (nC represents the number of image components):

<s:rule context="/jpylyzer/properties/contiguousCodestreamBox/cod">
<s:assert test=
    "(multipleComponentTransformation = 'yes') and 
        (../../jp2HeaderBox/imageHeaderBox/nC = '3') 
    or (multipleComponentTransformation = 'no') and 
        (../../jp2HeaderBox/imageHeaderBox/nC = '1')">
    no colour transformation</s:assert>
</s:rule>

Multiple element instances

Our profile states that the precinct size must be 256 x 256 for the 2 highest resolution levels, and 128 x 128 for the remaining ones. These occur as multiple instances of the precinctSize and precinctSizeY element in jpylyzer’s output, which we can handle as follows (note: for 5 decomposition levels we will have 6 resolution levels):

<s:rule context="/jpylyzer/properties/contiguousCodestreamBox/cod">
<s:assert test="precinctSizeY[1] = '128'">precinctSizeY doesn't match profile</s:assert>
<s:assert test="precinctSizeY[2] = '128'">precinctSizeY doesn't match profile</s:assert>
<s:assert test="precinctSizeY[3] = '128'">precinctSizeY doesn't match profile</s:assert>
<s:assert test="precinctSizeY[4] = '128'">precinctSizeY doesn't match profile</s:assert>
<s:assert test="precinctSizeY[5] = '256'">precinctSizeY doesn't match profile</s:assert>
<s:assert test="precinctSizeY[6] = '256'">precinctSizeY doesn't match profile</s:assert>
</s:rule>

The full profile as a schema

A sample schema that covers all aspects of the example format profile is available here.

Assessment of jpylyzer output against the schema

For the actual assessment (or validation) of jpylyzer output against the schema a couple of options exist. Probably the most widely-used one is the ISO Schematron reference implementation. Validation using that software involves a number of successive XSLT stylesheet transformations. A more accessible (but probaby less performant) alternative is the Probatron command-line executable. Using Probatron, asssessment of a JP2 would typically involve the following two steps:

1. Run jpylyzer

For example:

jpylyzer balloon.jp2 > balloon_jp2.xml

2. Validate jpylyzer’s output against the schema

Example:

java -jar probatron.jar balloon\_jp2.xml profile.sch > balloon_jp2_assessment.xml

Example output

The above procedure produces an XML file that contains a failed assert element for each test that failed. For example, the output below is generated if the number of layers is wrong:

<svrl:failed-assert test="layers = '8'" location="/jpylyzer[1]/properties[1]/contiguousCodestreamBox[1]/cod[1]" line="45" col="550">
<svrl:text>wrong number of layers</svrl:text>  
</svrl:failed-assert>

Demo

I created a small demo that illustrates the assessment procedure. It includes two JP2 images, the full schema of the example profile of this blog post, and a Windows batch file. For the moment it is located in my personal Github, but the schemas will probably be included in upcoming jpylyzer releases. To use the demo, just download the ZIP file, unzip it, open the batch file in a text editor and follow the instructions at the top of the file.

Final note

Although this blog post only covers the assessment of JP2 images using jpylyzer, the same procedure can be used for other formats and tools (provided that the tools are capable of producing XML output). Second, knowing that a JP2 is valid and conforms to a technical profile is certainly important, but it doesn’t say anything about the (quality of the) actual image content. So in an operational setting this will often require additional checks (e.g. a pixel-wise comparison between source and destination images).

Acknowledgement

Big thanks go out to Adam Retter (The National Archives) for his suggestion to use Schematron, just as I was struggling to make this work in XSD. Adam also shared some of his own Schematron schemas with me, which were a starting point for the work presented here.

Post script, February 2019

Since this post was originally published, jpylyzer’s output format has changed slightly: from version 1.14.0 onward, all output elements have an associated namespace. This means that the Schematron rules must be adapted accordingly. A set of example Schematron defintions that work with current versions of jpylyzer can be found here. They are part of jprofile, a simple tool that we use at the KB to assess JP2s from external suppliers. The source code of jprofile also demonstrates how to do this type of assessment in Python.


Originally published at the Open Preservation Foundation blog




Search

Tags

Archive

2024

March

2023

June

May

March

February

January

2022

November

June

April

March

2021

September

February

2020

September

June

April

March

February

2019

September

April

March

January

2018

July

April

2017

July

June

April

January

2016

December

April

March

2015

December

November

October

July

April

March

January

2014

December

November

October

September

August

January

2013

October

September

August

July

May

April

January

2012

December

September

August

July

June

April

January

2011

December

September

July

June

2010

December

Feeds

RSS

ATOM