Why PDF/A validation matters, even if you don't have PDF/A

07 July 2015

This is the first instalment of a 2-part blog. It was prompted by the upcoming Digital Preservation Coalition briefing When is a PDF not a PDF?, for which I was asked to prepare a presentation. My initial idea was to give an overview of the work we did on PDF preservation risk assessment using a PDF/A validator in the SCAPE project. Most of this has already been covered by a series of earlier blog posts. Those blogs very much represent different stages of a work in progress, and I think this makes them somewhat challenging for readers who are new to the subject.


Top 50 file formats in the KB e-Depot

29 April 2015

The current version of the KB’s digital repository system (e-Depot) doesn’t include any tools for automated file format identification yet. Our previous DIAS system didn’t have identification functionality either. As a result, information on file formats in digital our collections is largely based on publisher metadata and file extensions. Neither are necessarily correct. Moreover, previous analyses revealed a number of prevalent file extensions that could not be easily linked to a specific format. One result of this situation was that we couldn’t even reliably tell to what extent patrons were able to view e-Depot content on the PCs in our reading rooms (the obviously common formats aside).

To get a better view of the formats in our collection, we did an analysis of the “top 50” most prevalent file extensions in our e-Depot: what are the corresponding formats, can these formats be automatically identified, and can we render them in our reading rooms? This blog post summarises the main findings of this work.


Policy-based assessment of EPUB with Epubcheck

13 March 2015

Back in 2012 the KB conducted a first investigation of the suitability of the EPUB format for long-term preservation. The KB will soon start receiving publications in this format, and in anticipation of this, our Collection Care department has formulated a policy on the minimum requirements an EPUB must meet to ensure long-term accessibility. The policy largely follows the recommendations from the 2012 report. This blog explores to what extent it is possible to automatically assess the EPUBs that we receive against our policy using a combination of the Epubcheck tool and Schematron rules.


Dutch newspaper wipes out articles citing fabricated sources - Internet Archive to the rescue!

06 January 2015

Shortly before Christmas, Dutch daily newspaper Trouw removed 126 articles from its website. These articles were all authored by Perdiep Ramesar, a former journalist of the newspaper. Ramesar had been fired by Trouw in November, after it turned out that many of the sources that are cited in his articles were fabricated. The most notorious example was a series of pieces about the so-called “Sharia Triangle”, a neighbourhood in the city of The Hague, which Ramesar claimed was being ruled by Sharia law. As it turned out, this story was largely based on fabricated sources. Nevertheless, it was taken at face value by most major Dutch news outlets at the time, and even prompted a parliamentary debate.

Trouw’s decision to remove the 126 articles overnight was met with considerable criticism. For example, historian Jan Dirk Snel noted that the removal of these articles makes it impossible to check what was wrong with them in the first place. Various other critics accused Trouw of trying to rewrite history.


Perdiep Ramesar in het Internet Archive

28 December 2014

Eerder deze week verwijderde dagblad Trouw 126 artikelen van haar website die geschreven waren door ontslagen journalist Perdiep Ramesar. Aanleiding hiervoor was het onderzoek naar door Ramesar opgevoerde “niet traceerbare” bronnen. De beslissing van Trouw om de onbetrouwbare artikelen van de site af te halen stuitte op nogal wat kritiek. Sommigen noemden het geschiedvervalsing. Historicus Jan Dirk Snel merkte terecht op dat nu de stukken zijn verwijderd, niemand meer kan controleren wat er eventueel wel of niet aan deugt.



Search

Tags

Archive

2024

March

2023

June

May

March

February

January

2022

November

June

April

March

2021

September

February

2020

September

June

April

March

February

2019

September

April

March

January

2018

July

April

2017

July

June

April

January

2016

December

April

March

2015

December

November

October

July

April

March

January

2014

December

November

October

September

August

January

2013

October

September

August

July

May

April

January

2012

December

September

August

July

June

April

January

2011

December

September

July

June

2010

December

Feeds

RSS

ATOM