We recently started using Microsoft OneDrive at work. The other day a colleague used OneDrive to share a folder with a large number of ISO images with me. Since I wanted to work with these files on my Linux machine at home, and no official OneDrive client for Linux exists a this point, I used OneDrive’s web client to download the contents of the folder. Doing so resulted in a 6 GB ZIP archive. When I tried to extract this ZIP file with my operating system’s (Linux Mint 19.3 MATE) archive manager, this resulted in an error dialog, saying that “An error occurred while loading the archive”:
The output from the underlying extraction tool (7-zip) reported a “Headers Error”, with an “Unconfirmed start of archive”. It also reported a warning that “There are data after the end of archive”. No actual data were extracted whatsoever. This all looked a bit worrying, so I decided to have a more in-depth look at this problem.
Following earlier work on the preservation of optical media and data tapes, I recently got a request to make an inventory of offline digital data carriers in the KB’s deposit collection. The goal was to obtain approximate figures on the various carrier types in the collection. This was partially prompted by a project on at-risk digital heritage on physical carriers by the Dutch Digital Heritage Network (NDE) that the KB is participating in. This blog post presents the results.
A few weeks ago one of my web archiving colleagues approached me with an interesting question. From a list of Dutch web domains, he wanted to identify the (Dutch) province in which each domain is hosted. He was particularly interested in domains hosted in the province of Friesland. After some experimentation I was able to answer this question using a two-step procedure:
Geo-locate the web domains using a custom Python script.
Combine the results of the geolocation exercise with openly available geographical data using QGIS, an open-source geographical information system (GIS).
Even though the outcome of the analysis is not particularly interesting, I imagine both the geolocation methodology and the GIS analysis steps might be useful to others. So, this blog post is primarily intended as a tutorial that gives a walkthrough of the steps I followed.
Earlier this year I published this blog post on the recovery of data from ’90s data tapes. I will give a presentation on this during the upcoming iPres 2019 conference, and wrote a paper that discusses this work in more detail than my earlier blog post. The original paper (in PDF format) can be found here. The paper references a wealth of useful resources, but some of these are not easily accessible because the LaTeX template used does not handle hyperlinks well (this will be fixed in the final, post-conference version of the paper). Because of this I’ve created a web-friendly version of the paper below.
As I explained in the introduction of this earlier blog post, as part of our ongoing web archaeology project we are currently developing workflows for reading data from a variety of physical carrier formats. After the earlier work on data tapes and optical media, the next job was to image a small box with 3.5” floppy disks. Easy enough, and my first thought was to fire up Guymager and be done with it. This turned out to be less straightforward than expected, which led to the development of yet another workflow tool: diskimgr. In the remainder of this post I will first show the issues I ran into with Guymager, and then demonstrate how these issues are remedied by diskimgr.