Identification of physical storage media and devices with Python and the Windows API
This blog post covers some techniques that can be used to identify storage media and storage devices using Python and the Windows API. This can be useful for distinguishing between different types of portable storage media, such as floppy disks and USB thumb drives. It also presents a demo script that integrates these techniques.
Preservation of portable storage media
In 2019 the KB started a major initiative to safeguard the information on optical data carriers in its collection, such as CD-ROMs, DVDs and audio CDs. The Iromlab software is a central component of the workflow we’re using for this. The optical media preservation project will most likely reach its completion near the end of this year. As the KB collection also contains many other types of portable digital storage media (see this blog post for an overview), we’re currently looking into ways to preserve those as well. The first candidate will be 3.5 inch floppy disks. I’m currently working on a derivative of the Iromlab software that can be used for imaging floppies, but also other types of portable media, such as USB thumb drives. Like Iromlab, it wraps around IsoBuster to do the actual imaging. Since I’d like the new software to be able to deal with a variety of physical storage media types and devices, it would be useful to have a way to automatically identify the medium type prior to the imaging stage. On Windows (which is the target environment for this work), this kind of hardware identification is possible through the Win32 API.
In Python, parts of the Win32 API can be accessed using the pywin32 wrapper extensions, but using these extensions can be a bit of a challenge.There are a couple of reasons for this. First, the pywin32 documentation is incomplete and partially outdated. This means you’ll often have to rely on Microsoft’s documentation of the underlying C++ API. Using the API also involves some fairly low-level operations, such as creating file handles, defining output buffers and parsing binary output. Many of the (few) relevant code examples that can be found online are also outdated, which meant I had to combine examples and documentation from a variety of sources in order to get things working. Taken together, this all makes working with the Win32 API pretty daunting.
Purpose of this post
In this post I’ll try to document how I made basic media and device identification work for me. Given any device attached to a logical Windows drive (e.g. the “A drive”, “D drive”, etcetera), the objectives here are to identify:
- the storage media type (i.e. hard disk, floppy drive, etc.);
- the hardware device, and the storage media types that are supported by it.
I’ll link to the relevant documentation throughout. At the end of this post I also present a simple demo script that ties everything together, and which could be used as a basis for your own code.
In order to use the techniques described here, you’ll pywin32, which you can install using:
pip install pywin32
Then create a new Python file, and add the following imports:
import sys import struct import argparse import win32api import win32file import winioctlcon
First of all we need the device name that corresponds to the logical drive we’re interested in (e.g. drive A, C, D, and so on). For this we use (using the “A” drive as an example here):
drive="A" # Low-level device name of device assigned to logical drive driveDevice = "\\\\.\\" + drive + ":"
Create file handle
Next we create a file handle for this device, using the win32file.CreateFile method:
handle = win32file.CreateFile(driveDevice, 0, win32file.FILE_SHARE_READ, None, win32file.OPEN_EXISTING, 0, None)
Retrieve disk geometry info
We can now use the win32file.DeviceIoControl function to obtain information about a physical disk:
diskGeometry = win32file.DeviceIoControl(handle, winioctlcon.IOCTL_DISK_GET_DRIVE_GEOMETRY, None, 24)
In this case, the DeviceIoControl call contains four arguments:
- the file handle for the device;
- the control code
winioctlcon.IOCTL_DISK_GET_DRIVE_GEOMETRY(documented here), which tells DeviceIoControl to retrieves information about a physical disk’s geometry;
- an input buffer, which is set to
Nonein this case because it is not used;
- the output buffer size in bytes.
The output buffer size must be equal to (or larger than) the output that is returned by the function call. The output (a string of raw bytes) is defined by the
DISK_GEOMETRY structure, which is documented here. It contains 5 fields. The first field is a large integer (8 bytes); the remaining fields are all 4-byte unsigned integers. This means the total size of the
DISK_GEOMETRY structure is 24 bytes, so we use this as the buffer size.
The MediaType field is the second item of the
DISK_GEOMETRY structure. It is an unsigned integer (4 bytes) that starts at byte offset 8 of our diskGeometry variable. We can use Python’s struct module to interpret the raw bytes into an integer value:
offset = 8 mediaTypeCode = struct.unpack("<I", diskGeometry[offset:offset + 4])
Here the “
<I” format string informs unpack that the bytes represent a little-Endian unsigned integer. Note that
struct.unpack always returns a tuple, hence the “
\[0\]” index at the end.
The resulting mediaTypeCode value is an integer number. These numbers can be mapped back to media type strings using the MEDIA_TYPE enumeration in winioctlcon.py. These strings are in turn documented here.
Additional device info
Although the above method already allows us to identify, as an example, many types of floppy disks, it cannot distinguish between a USB thumb drive and a CD-ROM drive, both of which are simply identified as “RemovableMedia”. For some more granularity, we can call win32file.DeviceIoControl with the
IOCTL_STORAGE_GET_MEDIA_TYPES_EX control code (documented here):
getMediaTypes = win32file.DeviceIoControl(handle, winioctlcon.IOCTL_STORAGE_GET_MEDIA_TYPES_EX, None, 2048)
The function arguments are largely identical to the earlier (disk geometry) call, but this time we use a 2048 byte output buffer. This is a somewhat arbitrary value. Unlike
IOCTL_DISK_GET_DRIVE_GEOMETRY, the output size of
IOCTL_STORAGE_GET_MEDIA_TYPES_EX is not fixed (see also the explanation that follows below), so I’m simply using a fairly large value to ensure the buffer will be large enough to fit the output under all circumstances. The output (again a string of raw bytes) is defined by the
GET_MEDIA_TYPES structure, which is documented here.
GET_MEDIA_TYPES structure is made up of the following fields:
- a 4-byte unsigned integer that represents a device type code;
- another 4-byte unsigned integer that represents the number of
- a pointer to the first
We can read the device type code and the number of
DEVICE_MEDIA_INFO structures like this:
deviceCode = struct.unpack("<I", getMediaTypes[0:4]) mediaInfoCount = struct.unpack("<I", getMediaTypes[4:8])
The resulting deviceCode value is an integer number, which again can be mapped back to a device type string using an enumeration in winioctlcon.py. The value of mediaInfoCount tells us the number of
DEVICE_MEDIA_INFO structures we need to parse.
DEVICE_MEDIA_INFO structure itself is documented here. At first sight this may look a little intimidating, but essentially it just describes a “union” of 3 possible 32-byte structures. This means that each
DEVICE_MEDIA_INFO instance is made up of either of those structures. For the purpose of this post, we’re only interested in the MediaType field here, and this field can be at an identical location (the second item of the
DEVICE_MEDIA_INFO structure) for two of the three possible variants. Only in case of a tape device, MediaType is the first item. Tape devices can be identified from the value of deviceCode. Using this information, we can use the code below to iterate over all
DEVICE_MEDIA_INFO structures, and extract their respective media type codes:
# Start position in GET_MEDIA_TYPES structure # (remember we already read two 4-byte integers from it, # hence the 8 byte start offset!) offset = 8 # Loop over DEVICE_MEDIA_INFO structures for _ in range(mediaInfoCount): if deviceCode in [31, 32]: # Tape device, mediaTypeCode is first item mediaTypeCode = struct.unpack("<I", getMediaTypes[offset:offset + 4]) offset +=8 else: # Not a tape device, so skip 8 byte cylinders value offset += 8 mediaTypeCode = struct.unpack("<I", getMediaTypes[offset:offset + 4]) # Skip to position of next DEVICE_MEDIA_INFO structure offset += 24
The resulting mediaTypeCode values are integer numbers that can be mapped back to media type strings using the MEDIA_TYPE and STORAGE_MEDIA_TYPE enumerations in winioctlcon.py. These strings are in turn documented here and here.
Putting it all together
I created a simple demo script that ties everything discussed here together. It also contains lookup functions that translate the media type and device codes into human-readable strings. You can run it from the command prompt with one or more logical drive names as command-line arguments. For example:
python detectStorageMediaType.py A D E
The script then tries to establish the media type (from the disk geometry), the device type and the media types that are supported by the device, and reports the results back like this:
Drive: A Media type: F3_1Pt44_512 Drive: D Media type: RemovableMedia Device type: FILE_DEVICE_CD_ROM Supported media types: CD_ROM RemovableMedia Drive: E Media type: RemovableMedia Device type: FILE_DEVICE_DISK Supported media types: RemovableMedia
You may note that the output is incomplete in some cases, which is because the API calls do not work for all devices. In the above example, the
IOCTL_STORAGE_GET_MEDIA_TYPES_EX control code could not be used to access the floppy drive attached to logical drive A, so no device output is produced for that drive.
This example also showcases the level of detail that is reported for floppy disks. As can be seen here, the code
F3_1Pt44_512 indicates a 3.5” floppy disk, with 1.44MB and 512 bytes per sector.
Finally, the logical drive D in the above example was a virtual optical drive with an ISO image of a DVD-ROM attached to it. Despite that, the device is identified as “FILE_DEVICE_CD_ROM” (not “FILE_DEVICE_DVD”!), and DVDs are not listed as a supported media type. This could be a limitation of my test approach, which was based on Windows 10 running in VirtualBox inside a Linux host environment1. This would need additional testing with physical hardware.
This post only scratches the surface of what’s possible with the Windows API, but hopefully it will be useful to others to start their own explorations. I’ve only tested the code presented here with a limited number of devices (both physical and virtual ones) attached to a virtual machine running Windows 10. If you run into any surprises, or have suggestions for improvements, feel free to leave a comment here, or open a pull request for the demo script.
For reasons unknown to me I was unable to attach a physical DVD drive to this virtual machine for my tests. ↩
- Identification of physical storage media and devices with Python and the Windows API
- Introducing Isolyzer 1.4
- Offline digital data carriers in the KB deposit collection
- A simple workflow tool for imaging optical media using readom and ddrescue
- Resurrecting the first Dutch web index: NL-menu revisited
- Update on Isolyzer: UDF, HFS+ and more!
- Image and Rip Optical Media Like A Boss!
- Imaging CD-Extra / Blue Book discs
- Detecting broken ISO images: introducing Isolyzer
- Breaking WAVEs (and some FLACs too)
- Preserving optical media from the command-line