Towards a preservation workflow for mobile apps

24 February 2021
Satellite image of Wadden Sea
Production photo from "2001: A Space Odyssey". ©Stanley Kubrick Archives/TASCHEN.

My previous post addressed the emulation of mobile Android apps. In this follow-up, I’ll explore some other aspects of mobile app preservation, with a focus on acquisition and ingest processes. The 2019 iPres paper on the Acquisition and Preservation of Mobile eBook Apps by Maureen Pennock, Peter May and Michael Day again was the departure point. In its concluding section, they recommend:

In terms of target formats for acquisition, we reach the undeniable conclusion that acquisition of the app in its packaged form (either an IPA file or an APK file) is optimal for ensuring organisations at least acquire a complete published object for preservation.

And:

[T]his form should at least also include sufficient metadata about inherent technical dependencies to understand what is needed to meet them.

In practical terms, this means that the workflows that are used for acquisition and (pre-)ingest must include components that are able to deal with the following aspects:

  1. Acquisition of the app packages (either by direct deposit from the publisher, or using the app store).
  2. Identification of the package format (APK for Android, IPA for iOS).
  3. Identification of metadata about the app’s technical dependencies.

The main objective of this post is to get an idea of what would be needed to implement these components. Is it possible to do all of this with existing tools? If not so, what are the gaps? The underlying assumption here is an emulation-based preservation strategy1.


Four Android emulators, two apps

09 February 2021
Header image
"Android Robot" by Google Inc., used under CC BY 3.0, via Wikimedia Commons.

So far the KB hasn’t actively pursued the preservation of mobile apps. However, born-digital publications in app-only form have become increasingly common, as well as “hybrid” publications, with apps that are supplemental to traditional (paper) books. At the request of our Digital Preservation department, I’ve started some exploratory investigations into how to preserve mobile apps in the near future. The 2019 iPres paper on the Acquisition and Preservation of Mobile eBook Apps by the British Library’s Maureen Pennock, Peter May and Michael Day provides an excellent starting point on the subject, and it highlights many of the challenges involved.

Before we can start archiving mobile apps ourselves, some additional aspects need to be addressed in more detail. One of these is the question of how to ensure long-term access. Emulation is the obvious strategy here, but I couldn’t find much information on the emulation of mobile platforms within a digital preservation context. In this blog post I present the results of some simple experiments, where I tried to emulate two selected apps. The main objective here was to explore the current state of emulation of mobile devices, and to get an initial impression of the suitability of some existing emulation solutions for long-term access.

For practical reasons I’ve limited myself to the Android platform1. Attentive readers may recall I briefly touched on this subject back in 2014. As much of the information in that blog post has now become outdated, this new post presents a more up-to date investigation. I should probably mention here that I don’t own or use any Android device, or any other kind of smartphone or tablet for that matter2. This probably makes me the worst possible person to evaluate Android emulation, but who’s going to stop me trying anyway? No one, that’s who!


Mapping the Dutch web domain

09 September 2020
Satellite image of Wadden Sea
“World Wide Web - Digital Preservation” by Jørgen Stamp, used under CC BY 2.5 DK; “Wadden Sea” by Envisat satellite, used under CC BY-SA 3.0-IGO.

Earlier this year I wrote a blog post about geo-locating web domains, and the subsequent analysis of the resulting data in QGIS. At the time, this work was meant as a proof of concept, and I had only tried it out on a small set of test data. We have now applied this methodology to the whole of the Dutch (.nl) web domain, and this follow-up post presents the results of this exercise.


Restoring Liesbet's Virtual Home, a digital treasure from the early Dutch web

30 June 2020
Liesbet door
Original artwork copyright ©Liesbet Zikkenheimer.

In 2019, Dutch telecommunications company KPN announced its plans to phase out its subsidiary XS4ALL, which is one of the oldest internet service providers in the Netherlands. With this decision, thousands of homepages and personal web sites that are hosted under the XS4ALL domain are at risk of disappearing forever. The web archiving team of the National Library of the Netherlands (KB) has started an initiative to rescue a selection of these homepages, which includes some of the oldest born-digital publications of the Dutch web. This blog post describes an attempt to rescue and restore one of the oldest and most unique homepages from this collection: Liesbet’s Virtual Home (Liesbet’s Atelier), the personal web site of Dutch Internet pioneer Liesbet Zikkenheimer, which has a history that goes back to 1995. First I give some background information about XS4ALL, and the KB-led rescue initiative. Then I move on to the various (mostly technical) aspects of restoring Liesbet’s Virtual Home. Finally, I address the challenges of capturing the restored site to an ingest-ready WARC file.


ISO/IEC TS 22424 standard on EPUB3 preservation

30 April 2020
Scream
“The Scream”, undated drawing by Edvard Munch, Bergen Kunstmuseum, Public domain.

Earlier this week Library of Congress added a new entry on the standard “Digital publishing — EPUB3 preservation” (ISO/IEC TS 22424) to its excellent Digital Formats web site. This standard was developed by the ISO Technical Committee on Document description and processing languages, and was published in January this year (2020).

According to its authors, “the ISO/IEC TS 22424 series supports long-term preservation of EPUB publications via a dual strategy”. The standard is made up of 2 parts, which are sold as separate documents on the ISO website:

  1. Part 1: Principles (ISO/IEC TS 22424-1:2020)

  2. Part 2: Metadata requirements ISO/IEC TS 22424-2:2020

In this blog post I will take a closer look at both parts of the standard. What do they purport, what is their scope, and to what degree do they live up to their stated promises? Readers who are only interested in the most important findings may want to jump to the “Summary and discussion” section at the end of this post.



Search

Tags

Archive

2021

February

2020

September

June

April

March

February

2019

September

April

March

January

2018

July

April

2017

July

June

April

January

2016

December

April

March

2015

December

November

October

July

April

March

January

2014

December

November

October

September

August

January

2013

October

September

August

July

May

April

January

2012

December

September

August

July

June

April

January

2011

December

September

July

June

2010

December

Feeds

RSS

ATOM