Identification of physical storage media and devices with Python and the Windows API

14 June 2022
Still life of assorted storage media

This blog post covers some techniques that can be used to identify storage media and storage devices using Python and the Windows API. This can be useful for distinguishing between different types of portable storage media, such as floppy disks and USB thumb drives. It also presents a demo script that integrates these techniques.


Introducing Isolyzer 1.4

20 April 2022
Compact Discs still life

It’s been a while since the last release of the Isolyzer tool, but after four years of near-inactivity I just published Isolyzer 1.4. In this post I provide some background information on how this release came about, and I briefly explain the main changes.


Generating lossy access JP2s from lossless preservation masters

30 March 2022
Plumbers Tool Box
Intensive Breeding by Jean Marc Cote, Public domain, via Wikimedia Commons.

At the KB we’ve been using JP2 (JPEG 2000 Part 1) as our primary image format for digitised newspapers, books and periodicals since 2007. The digitisation work is contracted out to external vendors, who supply the digitised pages as losslessly compressed preservation masters, as well as lossily compressed access images that are used within the Delpher platform.

Right now the KB is in the process of migrating its digital collections to a new preservation system. This prompted the question whether it would be feasible to generate access JP2s from the preservation masters in-house at some point in the future, using software that runs inside the preservation system1. As a first step towards answering that question, I created some simple proof of concept workflows, using three different JPEG 2000 codecs. I then tested these workflows with preservation master images from our collection. The main objective of this work was to find a workflow that both meets our current digitisation requirements, and is also sufficiently performant.


On The Significant Properties of Spreadsheets

24 September 2021
Clippy saying It looks like you're migrating a spreadsheet to ... TIFF?!

Earlier this month saw the publication of The Significant Properties of Spreadsheets. This is the final report of a six-year research effort by the Open Preservation Foundation’s Archives Interest Group (AIG), which is composed of participants from the National Archives of the Netherlands (NANETH), the National Archives of Estonia (NAE), the Danish National Archives (DNA), and Preservica. The report caught my attention for two reasons. First, there’s the subject matter of spreadsheets, on which I’ve written a few posts in the past1. Second, it marks a surprising (at least to me!) return of “significant properties”, a concept that was omnipresent in the digital preservation world between, roughly, 2005 and 2010, but which has largely fallen into disuse since then. In this post I’m sharing some of my thoughts on the report.


PDF processing and analysis with open-source tools

06 September 2021
Plumbers Tool Box
Plumbers Tool Box by pszz on Flickr. Used under CC BY-NC-SA 2.0.

Over the years, I’ve been using a variety of open-source software tools for solving all sorts of issues with PDF documents. This post is an attempt to (finally) bring together my go-to PDF analysis and processing tools and commands for a variety of common tasks in one single place. It is largely based on a multitude of scattered lists, cheat-sheets and working notes that I made earlier. Starting with a brief overview of some general-purpose PDF toolkits, I then move on to a discussion of the following specific tasks:

  • Validation and integrity testing
  • PDF/A and PDF/UA compliance testing
  • Document information and metadata extraction
  • Policy/profile compliance testing
  • Text extraction
  • Link extraction
  • Image extraction
  • Conversion to other (graphics) formats
  • Inspection of embedded image information
  • Conversion of multiple images to PDF
  • Cross-comparison of two PDFs
  • Corrupted PDF repair
  • File size reduction of PDF with hi-res graphics
  • Inspection of low-level PDF structure
  • View, search and extract low-level PDF objects


Search

Tags

Archive

2022

June

April

March

2021

September

February

2020

September

June

April

March

February

2019

September

April

March

January

2018

July

April

2017

July

June

April

January

2016

December

April

March

2015

December

November

October

July

April

March

January

2014

December

November

October

September

August

January

2013

October

September

August

July

May

April

January

2012

December

September

August

July

June

April

January

2011

December

September

July

June

2010

December

Feeds

RSS

ATOM