Why PDF/A validation matters, even if you don't have PDF/A - Part 2 08 Jul 2015

This is the second and final instalment of a 2-part blog on the use of PDF/A validators for identifying preservation risks in PDF. You can read the first part here. In Part 1 I showed how PDF/A validators can be used to identify preservation risks in a PDF. I illustrated this with an example that uses the PDF/A validator component of Adobe Acrobat’s Preflight tool. Needless to say, Acrobat is not scalabe to situations where you need to verify large volumes of PDFs. Luckily, several stand-alone PDF/A validators exist that are designed especially to do just that.

More ...

Why PDF/A validation matters, even if you don't have PDF/A 07 Jul 2015

This is the first instalment of a 2-part blog. It was prompted by the upcoming Digital Preservation Coalition briefing When is a PDF not a PDF?, for which I was asked to prepare a presentation. My initial idea was to give an overview of the work we did on PDF preservation risk assessment using a PDF/A validator in the SCAPE project. Most of this has already been covered by a series of earlier blog posts. Those blogs very much represent different stages of a work in progress, and I think this makes them somewhat challenging for readers who are new to the subject.

More ...

Top 50 file formats in the KB e-Depot 29 Apr 2015

The current version of the KB’s digital repository system (e-Depot) doesn’t include any tools for automated file format identification yet. Our previous DIAS system didn’t have identification functionality either. As a result, information on file formats in digital our collections is largely based on publisher metadata and file extensions. Neither are necessarily correct. Moreover, previous analyses revealed a number of prevalent file extensions that could not be easily linked to a specific format. One result of this situation was that we couldn’t even reliably tell to what extent patrons were able to view e-Depot content on the PCs in our reading rooms (the obviously common formats aside).

To get a better view of the formats in our collection, we did an analysis of the “top 50” most prevalent file extensions in our e-Depot: what are the corresponding formats, can these formats be automatically identified, and can we render them in our reading rooms? This blog post summarises the main findings of this work.

More ...

Policy-based assessment of EPUB with Epubcheck 13 Mar 2015

Back in 2012 the KB conducted a first investigation of the suitability of the EPUB format for long-term preservation. The KB will soon start receiving publications in this format, and in anticipation of this, our Collection Care department has formulated a policy on the minimum requirements an EPUB must meet to ensure long-term accessibility. The policy largely follows the recommendations from the 2012 report. This blog explores to what extent it is possible to automatically assess the EPUBs that we receive against our policy using a combination of the Epubcheck tool and Schematron rules.

More ...

Dutch newspaper wipes out articles citing fabricated sources - Internet Archive to the rescue! 06 Jan 2015

Shortly before Christmas, Dutch daily newspaper Trouw removed 126 articles from its website. These articles were all authored by Perdiep Ramesar, a former journalist of the newspaper. Ramesar had been fired by Trouw in November, after it turned out that many of the sources that are cited in his articles were fabricated. The most notorious example was a series of pieces about the so-called “Sharia Triangle”, a neighbourhood in the city of The Hague, which Ramesar claimed was being ruled by Sharia law. As it turned out, this story was largely based on fabricated sources. Nevertheless, it was taken at face value by most major Dutch news outlets at the time, and even prompted a parliamentary debate.

Trouw’s decision to remove the 126 articles overnight was met with considerable criticism. For example, historian Jan Dirk Snel noted that the removal of these articles makes it impossible to check what was wrong with them in the first place. Various other critics accused Trouw of trying to rewrite history.

More ...