7ZIP compression benchmark for large files (Virtual Machines)

Manipulating large files from virtual machines (VM) creates problems for moving such files around or storing them. Typically such VMs have file sizes between 5 GBytes to 200-300 GBytes or larger. USB3 copies with speeds around 50 MByte/s to 150 MByte per second. Cheap USB3 sticks usually in the lower 50 MByte/sec range, also depending on target drive fragmentation. Compressing such VMs helps saving some time and space during file transfers.

Modern compression tools such as the free 7ZIP are multi-threaded (use multiple CPUs) and can compress large files in a few seconds. Hence if the file copy process takes 10 minutes, its worthwhile to compress the file to speed up file-copy operations and save some space.

7ZIP is multithreaded and allows to use multiple CPUs during compression

The following comparison investigates the compression of a large 11 Gbyte Windowx XP VM and how long compression and decompression takes. The VM was compression on a free OSFMount RAMDISK. The 7ZIP algorithms LZMA2 and BZIP2 (implemented as *.7z file) were the fastest and had a compression time of around 1 minute and decompression times of around 2 minutes (non-threaded). The file was shrunken from 11 GByte to 5.85 GByte and the ULTRA settings would save 15% more space but with almost double compression time increase. The ultimate goal here is to save space on USB sticks and save time during transfer.

The selection of the "Fastest" mode was only beaten by simply storing the file in an archive without compression them, but even using the Fastest mode yielded almost 50% compression of a VM file. These values are of course different when files are already compressed such as pictures or other ZIP files.

The decompression itself was not multi-threaded, hence much slower and BZIP2 provided the fastest decompression time of around 2 minutes (see below). Which is OK. The error correction features (such as bit error correcting Huffman or Hamming codes) are not implemented in 7ZIP yet (well its only year 2013). That means the archiver can only check if the archive is corrupt via CRC methods, but can not correct the error. That was confirmed by a series of tests and corrupting such an compressed archive and uncompressing it. Also testing available other compression algorithms with variable redundancy options (the error correction can add between 10-100% of the file size) it was obvious that *bit errors* can be corrected, but not large missing byte chunks or not single *byte* errors. Lesson learned is that ZIP files do not provide any form of safety (other than having a smaller copy) and regular backups are still needed.

The DUAL-E5-2687W with 16 CPUs and 32 threads provides around 67000 MIPS in the 7ZIP benchmark, which is decent compared to older single threaded chips only.

Conclusion: For 7ZIP use the *.7z format with LZMA2 and BZIP2 with the "fastest" setting to obtain 50% compression in a reasonable (fastest) amount of time. Because 7Zip is open source, there should be no issues with compatibilities in the near future.