Space efficient source code storage


During implementation of DistTest I faced with necessity of building a lot of different linux kernel versions. As a first solution I chose downloading archives from for each used version. But I soon realized that about 1000 versions of sources with size 0.5-1GB each would consume a lot of disk space. It’s also impossible to build kernel with exact commit precision using this approach.
Set of base versions with corresponding patches can save disk space, but uses a lot of random I/O during applying patches, so it’s slow on HDD and consume finite rewrite resource of SSD. Temporary nature of sources leads to conclusion “use tmpfs”. But aufs offers much less RAM consuming method – store in RAM only diffs.

Also it’s a lot easier to checkout sources with commit level precision if there are no need to store them in persistent location.
aufs has following layout in this case:

  • repo – repo location(with .git directory)
  • seed – sources with some close to needed version
  • tmp – diffs storage location
  • aufs – aufs mount point to combine all directories above

For example linux v4.5 building can be performed as follows:

There is a single drawback of this approach: mount should be performed by root user. It can be easily fixed by adding corresponding records to /etc/fstab:

Fixed mount points doesn’t give a lot of flexibility, but at least symlinks can be used as aufs branches:

Yuriy Nazarov on GithubYuriy Nazarov on Twitter
Yuriy Nazarov
Software engineer
Love machine learning