I am surprised all major browsers hadn't implemented this years ago. Zstd is way...

silverquiet · on April 1, 2024

How much better is Zstd? Compression seems like one of those things where diminishing returns kick in very quickly. Whenever I need compression, it's always just been basically, "throw gzip at it because it does a lot without much of a performance hit". Basically it does the job.

derekp7 · on April 1, 2024

Try this: time tar -cf /usr/share/doc |gzip |wc -c vs. time tar -cf /usr/share/doc |zstd |wc -c

repeat a few times to warm up your disk cache if needed. On mine host (with an nvme disk), zstd was about slightly better compression ratio than gzip, but took 1 second instead of 9 seconds to compress. Compare against something like lzop, which is about the same speed, but produces much worse compression.

Of course, with gzip if you have multiple cores you have the option of using pigz which bring the wall-clock time of gzip down to comparable to zstd and lzop.

loeg · on April 1, 2024

> with gzip if you have multiple cores you have the option of using pigz which bring the wall-clock time of gzip down to comparable to zstd and lzop.

(But then you should use zstd -T0 for an apples to apples comparison.)

silverquiet · on April 1, 2024

Thank you - that's a great, simple test.

idoubtit · on April 1, 2024

For a benchmark on a standard set: https://github.com/inikep/lzbench/blob/master/lzbench18_sort... Of course, you may get different results with another dataset.

gzip (zlib -6) [ratio=32%] [compr=35Mo/s] [dec=407Mo/s]

zstd (zstd -2) [ratio=32%] [compr=356Mo/s] [dec=1067Mo/s]

NB1: The default for zstd is -3, but the table only had -2. The difference is probably small. The range is 1-22 for zstd and 1-9 for gzip.

NB2: The default program for gzip (at least with Debian) is the executable from zlib. With my workflows, libdeflate-gzip iscompatible and noticably faster.

NB3: This benchmark is 2 years old. The latest releases of zstd are much better, see https://github.com/facebook/zstd/releases

For a high compression, according to this benchmark xz can do slightly better, if you're willing to pay a 10× penalty on decompression.

xz -9 [ratio=23%] [compr=2.6Mo/s] [dec=88Mo/s]

zstd -18 [ratio=25%] [compr=3.6Mo/s] [dec=912Mo/s]

lifthrasiir · on April 1, 2024

If you are not too much constrained in I/O rate, Zstandard has no match; it can easily clock more than 1 GB/s with a good compression ratio and also can automatically adapt to the changing I/O rate. Web browsers typically work with much less bandwidth available, though, so both Brotli and Zstandard are virtually identical in the clients.

jorvi · on April 1, 2024

It’s not so much about IO rate but access times.

If compression and decompression speed are critical, lz4. Zstd for pretty much everything else.

There are edge cases where compression time doesn’t matter but decompression time does. This used to be the case for game torrents back in the old days, and UHARC was used for that to great effect. Not sure what the current king is for that purpose.

sodality2 · on April 1, 2024

Something like FreeArc would be used nowadays for repacks, typically - there's gains to be made by tailoring your procedure depending on the type of game asset, and some of those decisions won't apply to general compression.

FreeArc Next does actually use zstd as above but it also does a lot of tricks with compression, dictionaries, etc while taking much longer to process.

As an example, looking at FitGirl's COD:BO3 repack, 180GB->42.4GB entirely losslessly. Not sure how regular compression would fare on the original game, though.

repiret · on April 1, 2024

One of the systems I maintain at work uses enormous, highly compressible, text files, which need to be compressed once and decompressed many times. Decompression speed isn't critical, it just needs to keep up with processing the decompressed data to avoid being the bottleneck. We optimize primarily for compression ratio.

For that system, we haven't found something that beats `xz -9`.

lifthrasiir · on April 1, 2024

Brotli has a higher potential for the compression ratio due to its internal structure, so that edge case would be better served by Brotli than by Zstandard.

Y_Y · on April 1, 2024

Hardly surprising when you consider gzip's DEFLATE was around in 1990 and developed by one guy as shareware and zstd was produced by a megacorp in 2015.

lifthrasiir · on April 1, 2024

Zstandard was also mainly designed and produced by a single person, Yann Collet.

dralley · on April 1, 2024

Yann also developed LZ4

KingMob · on April 1, 2024

Yes, but they also work for megacorp Facebook, and according to https://github.com/facebook/zstd/graphs/contributors, 300+ other contributors have made 4500+ commits to the zstd repo.

It's not quite as small-scale as 90's-style shareware was.

kzrdude · on April 1, 2024

zstd has improved a lot since it was brought in under the facebook roof, but those are incremental changes, it already existed before then.

Here's the good old blog about that http://fastcompression.blogspot.com/

zerd · on April 1, 2024

I recommend the CoRecursive episode about how LZ4 and zstd came to be. It started with making games for a HP calculator. https://corecursive.com/data-compression-yann-collet/

veltas · on April 1, 2024

Yes it's worth taking a minute to appreciate how well DEFLATE and gzip has stood up over the years. It's a brilliant format and tool. I was definitely a little too ready to believe the paper that claimed gzip's ability to classify text was better than deep neutral nets, alas it has some limits after all!

silisili · on April 1, 2024

Highly dependent on several factors.

I was comparing about ten compression algos for compressing json data. Mainly needing something reasonably fast and good compressing. zstd did well, but brotli absolutely crushed it at every metric. Of course, it's a data point of one, but it exists.