App Disk Images

Posted by Thoughts and Ramblings on Friday, September 23, 2022

If you’ve ever installed Xcode via the Mac App Store, you know it can take an hour to install. The reason is not due to its size but the large number of individual files. What if it didn’t have so many small files? Could optimizations made here apply to other apps as well?

Overview

The idea is simple: instead of storing apps as a constellation of individual files, instead store a disk image with its own filesystem. The idea isn’t new and has been used elsewhere and so much of what I’m going to outline here is what one familiar with the idea might expect. So instead of an app being a directory, its current representation, the contents of that directory exist in a read-only filesytem stored in a disk image. Then the app is really a single file, the image itself. Accessing contents within the app is simply traversing the filesystem within the image.

Speed and Optimizations

Certain optimizations would need to be maintained which the OS currently enjoys with the current representation of individual files:

  1. The image must be uncompressed. Traversing a compressed image usually required uncompressing large chunks of the image to access small pieces. However, if compression is limited to small blocks, such a few sectors, then it could be worth the overhead but with today’s processor and NVMe speeds this seems unlikely.
  2. Match sector size. Typically this means that the sector size inside the disk image will be 4kB. This way a load of a sector inside the image corresponds to loading a sector on the host filesystem.
  3. Support for mmap. Memory mapping files is commonly used for executables. Supporting mmap of a file within an image is simply determining the sector range within the image and memory mapping that range in the image file.

Updating Images

It would be nice if updating an app were to only get what actually changed. This is possible if the host filesystem supports sparse files (as will be seen later). Creating an update image is done via:

  1. Start with the image for the previous version (in blue) Version 1 App
  2. Store new files and new filesystem structures in the space after the end of the first image (in green) Version 2 App
  3. Record which sectors are freed in the process (either replaced by new data or removed in the new image; in red) Version 3 App
  4. Save this image (which is the previous version + new data) and the list of freed sectors.

This process of a client updating is similar. Say the client has revision 1 but revision 2 is available.

  1. Start with the image for version 1 (in blue) Version 1 App
  2. Client transferring data after the end of the first revision’s image and the list of sectors freed
  3. It append this data to the revision 1 image on the disk (in green) Version 2 App
  4. It goes through the list of sectors freed and punches holes in the disk image at these locations Version 3 App

In the end, the holes are not actually stored on the disk. So the resulting data stored on the disk is more akin to the following diagram: Version 4 App

In this process, the system goes from having the data in only revision 1, adds the new data for revision 2, then deletes the data only in revision 1 leaving only the data in only revision 2. Contrast that with the current setup where the system has the data in revision 1, adds the compressed revision 2, add the uncompressed revision 2, then remove the compressed revision 2 and the original revision 1.

Fresh Install of Updated Images

The fresh installation of one of these updated images does not require transferring data in the holes. For example, the client is fetching version 2 of the image above. Version 3 App The client gets first gets the list of holes (the description of where the red is). Then server sends this image but skips the holes (the parts in red). When writing to disk, the client writes this data but skips the space where the holes go (again, the parts in red). The result is a sparse file of the disk image identical to the case where the image was updated from a previous version.

Full Images

There is going to be a cutoff where updating images like described above just isn’t worth doing. An app update could change everything or change so much that the amount of data left unchanged is too small to be worth creating a delta. In this case, it’s just easier to download the entire app in a fresh image and replace the old image with the new one. Even in this case, the process is more efficient than the current mechanism.

Reducing Overhead

A disk image can carry an amount of overhead but this overhead can be significantly reduced. A filesystem often has a volume bitmap but in this design the volume bitmap is simply the allocated blocks (those that are not holes) of the disk image. So there’s no need to store the volume bitmap at all. If one adds the restriction that files can only be stored in contiguous blocks, then this removes the need for indirect blocks in the filesystem (while the host filesystem will certainly use indirect blocks for the disk image). So a file’s entry in the filesystem metadata stores only the start sector number and sector count without dealing with 1 or 2 level indirection.

These two changes should bring the overhead down to a point where it is very close to the same storage of the individual files in the host filesystem.

Conclusion

I’ve presented a simple system for software updates from an app store which provide fast installation, upgrades, and efficient disk usage. Hopefully this, or something like it, is adopted soon.