Skip header

Compression Algorithm Testing (12023-06-16)

brook > dive > you can zip it

.Zip It Good!

I conducted some tests to see which compression formats squeeze certain file formats better. Because apparently even my backups need to be optimized.

PeaZip 7.32

Initial testing was conducted in July 12020 with PeaZip, which incorporates its own PEA format alongside 7-Zip (gzip, bzip2, xz, zip, 7z) and other FOSS algorithms (BCM, Brotli, FreeArc, ZPaq, zstandard).

Text

A folder containing mostly HTML files downloaded from AO3, plus some ASCII plaintext files:

LEAST SIZE REDUCTION --------- MOST SIZE REDUCTION
pea, zpaq, zip, gz, xz, br, zst, 7z, bz2, arc, bcm

SLOWEST ---------------------------------- FASTEST
bz2, gz, zip, 7z, zst, arc, br, bcm, zpaq, xz, pea
BYTES ^  TIME (SECONDS)
5108105  0.000  .html/.txt
1856844  0.591  .pea
1801952  0.691  .zpaq
1766751  5.900  .zip
1675464  6.400  .tar.gz
1567204  0.620  .tar.xz
1451784  1.200  .tar.br
1345611  2.800  .tar.zst
1332424  3.000  .7z
1280880  9.500  .tar.bz2
1072873  1.300  .arc
1072769  1.100  .tar.bcm

Images

PNGs and JPGs plus the occasional GIF:

LEAST SIZE REDUCTION --------- MOST SIZE REDUCTION
zpaq, pea, gz, zip, xz, bcm, bz2, arc, br, 7z, zst

SLOWEST ---------------------------------- FASTEST
bz2, br, zip, 7z, gz, bcm, arc, zst, pea, xz, zpaq
KB ^    TIME (M:SS)
141021  0:00  .gif/.png/.jpg
140488  0:02  .zpaq
138922  0:12  .pea
138773  1:00  .tar.gz
138768  1:28  .zip
138544  0:06  .tar.xz
138493  0:40  .tar.bcm
138150  9:00  .tar.bz2
137783  0:30  .arc
135883  1:50  .tar.br
135742  1:09  .7z
135560  0:13  .tar.zst

Videos

Mostly MP4S with the occasional FLV and WEBM:

LEAST SIZE REDUCTION --------- MOST SIZE REDUCTION
zpaq, arc, pea, xz, gz, bz2, zip, 7z, br, zst, bcm

SLOWEST ---------------------------------- FASTEST
bz2, br, zip, gz, 7z, arc, bcm, zst, pea, xz, zpaq
MB ^   TIME (MM:SS)
728.0  00:00  .flv/.mp4/.webm
719.5  00:09  .zpaq
717.0  04:26  .arc
711.0  01:14  .pea
710.3  00:51  .tar.xz
709.9  09:16  .zip
709.7  09:11  .tar.gz
709.2  49:25  .tar.bz2
708.0  07:30  .7z
707.3  17:19  .tar.br
706.4  01:26  .tar.zst
702.0  04:03  .tar.bcm

Flash

Purely SWF animations:

LEAST SIZE REDUCTION --------- MOST SIZE REDUCTION
zpaq, bcm, pea, bz2, zip, gz, arc, xz, br, zst, 7z

SLOWEST ---------------------------------- FASTEST
bz2, br, zip, gz, 7z, bcm, arc, zst, pea, xz, zpaq
MB ^   TIME (MM:SS)
253.1  00:00  .swf
238.5  00:04  .zpaq
238.3  01:12  .tar.bcm
236.6  00:22  .pea
236.5  16:43  .tar.bz2
236.2  03:16  .zip
236.2  03:01  .tar.gz
235.5  01:02  .arc
233.5  00:12  .tar.xz
232.1  03:21  .tar.br
231.9  00:22  .tar.zst
231.0  02:20  .7z

Others

PDF

1.36GB of PDF files, compressed with 7-Zip ZS 1.5.0 R1 (gzip, bzip2, xz, zip, 7z, lizard, lz4, lz5, zstandard):

LARGER --------------- SMALLER
lz5, lz4, gz, liz, 7z, zst, xz (1.19GB)

SLOWEST -------------- FASTEST
gz, 7z, lz5, xz, liz, zst, lz4

I did not test bzip2 as it was previously shown to be the slowest algorithm available.

HTML Dumps

With PeaZip 7.90, I tested out some archived message boards that were mostly HTML with very little CSS or images:

LEAST SIZE REDUCTION --------- MOST SIZE REDUCTION
zip, pea, gz, bz2, br, zst, zpaq, xz, 7z, bcm, arc

XML/JSON

I threw 7-Zip ZS 1.5.0 R1's formats at a small collection of bookmark files and JSON/OPML settings exports:

LARGER ------------------------- SMALLER
lz5, lz4, liz, zip, gz, bz2, zst, 7z, xz

Conclusions

For text files and websites under 5GB in size, the FreeArc 0.67 alpha from 12014 absolutely dunks on everything else, even 7+ years after it was made. For larger website mirrors, 7z was able to achieve smaller sizes.

For images, videos, and animations, zstandard offers if not the best compression ratio, then the best speed/size compromise. But, if you absolutely must have the smallest backups, use BCM for video, and 7z for SWFs.

To save a bit of space, I replaced PeaZip with 7-Zip ZS, but kept FreeArc's CLI utility around to crunch them HTTrack backups since it's exceptional at doing so. The rest of my backups are stored in 7Zs for general files, ZSTs for images, and XZ for PDFs.


[Listen]Listen to "Modern Talking - In 100 Years..." on YouTube?

[Nav]In too deep? Perhaps you could use a Roadmap.


τ