Roll the tape - recovering '90s data tapes in BitCurator 31 Jan 2019

When the KB web archive was launched in 2007, many sites from the “early” Dutch web had already gone offline. As a result, the time period between (roughly) 1992 and 2000 is seriously under-represented in our web archive. To improve the coverage of web sites from this historically important era, we are now looking into Web Archaeology tools and methods. Over the last year our web archiving team has reached out to creators of “early” Dutch web sites that are no longer online. It’s not uncommon to find that these creators still have boxes of offline carriers with the original source data of those sites. Using these data, we would (in many cases) be able to reconstruct the sites, similarly to how we reconstructed the first Dutch web index last year. Once reconstructed, they could then be ingested into our web archive.

More ...

Crawling offline web content: the NL-menu case 11 Jul 2018

In a previous blog post I showed how we resurrected NL-menu, the first Dutch web index. It explains how we recovered the site’s data from an old CD-ROM, and how we subsequently created a local copy of the site by serving the CD-ROM’s contents on the Apache web server. This follow-up post covers the final step: crawling the resurrected site to a WARC file that can be ingested into our web archive.

More ...

Resurrecting the first Dutch web index: NL-menu revisited 24 Apr 2018

NL-menu was the first Dutch web index. The site was originally founded by a consortium of SURFnet, Dutch universities and the KB. From the mid-nineties onwards it was maintained solely by the KB. NL-menu was discontinued in 2004, after which the site was taken offline. In 2006 the domain name was sold to a private company that used it for hosting a web index that was partially based on the original NL-menu site.

More ...

Update on Isolyzer: UDF, HFS+ and more! 12 Jul 2017

Earlier this year I blogged about Isolyzer, a tool designed to help the detection of broken ISO images. Today I released a shiny new beta version that adds a significant amount of new functionality. Below is an overview of the main changes, followed by some warnings and caveats.

More ...

Image and Rip Optical Media Like A Boss! 19 Jun 2017

Over the last months we’ve been working on the development of a provisional workflow for preserving the content of optical media in our collection. The main result thus far is Iromlab, a custom workflow application that streamlines the imaging and ripping process. This blogpost gives an overview of Iromlab, as well as the reasons why we created it in the first place.

More ...