Steam is working on adding zstd compression for game chunks (every game file is split into 1MB), it currently uses LZMA.
Wonder how much of an overall improvement it will be.
zstd is generally stupidly fast and quite efficient.
probably not exactly how steam does it, or even close, but as a quick & dirty comparison: compressed and decompressed a random CD.iso (~375 MB) I had laying about, using zstd and lzma, using 1MB dictitionary:
test system: Arch linux (btw, as is customary) laptop with AMD Ryzen 7 PRO 7840U cpu.
used commands & results:
Zstd:
# compress (--maxdict 1048576 - sets the used compression dictionary to 1MB) :
% time zstd --maxdict 1048576 < DISC.ISO > DISC.zstd
zstd --maxdict 1048576 < DISC.ISO > DISC.zstd 1,83s user 0,42s system 120% cpu 1,873 total
# decompress:
% time zstd -d < DISC.zstd > /dev/null
zstd -d < DISC.zstd > /dev/null 0,36s user 0,08s system 121% cpu 0,362 total
resulting archive was 229 MB, ~61% of original.
~1.9s to compress
~0.4s to decompress
So, pretty quick all around.
Lzma:
# compress (the -1e argument implies setting preset which uses 1MB dictionary size):
% time lzma -1e < DISC.ISO > DISC.lzma
lzma -1e < DISC.ISO > DISC.lzma 172,65s user 0,91s system 98% cpu 2:56,16 total
#decompress:
% time lzma -d < DISC.lzma > /dev/null
lzma -d < DISC.lzma > /dev/null 4,37s user 0,08s system 98% cpu 4,493 total
~179 MB archive, ~48% of original-
~3min to compress
~4.5s to decompress
This one felt like forever to compress.
So, my takeaway here is that the time cost to compress is enough to waste a bit of disk space for sake of speed.
and lastly, just because I was curious, ran zstd on max compression settings too:
% time zstd --maxdict 1048576 -9 < DISC.ISO > DISC.2.zstd
zstd --maxdict 1048576 -9 < DISC.ISO > DISC.2.zstd 10,98s user 0,40s system 102% cpu 11,129 total
% time zstd -d < DISC.2.zstd > /dev/null
zstd -d < DISC.2.zstd > /dev/null 0,47s user 0,07s system 111% cpu 0,488 total
~11s compression time, ~0.5s decompression, archive size was ~211 MB.
deemed it wasn’t nescessary to spend time to compress the archive with lzma’s max settings.
Now I’ll be taking notes when people start correcting me & explaining why these “benchmarks” are wrong :P
edit:
goofed a bit with the max compression settings, added the same dictionary size.
edit 2: one of the reasons for the change might be syncing files between their servers. IIRC zstd can be compressed to be “rsync compatible”, allowing partial file syncs instead of syncing entire file, saving in bandwidth. Not sure if lzma does the same.
zstd is generally stupidly fast and quite efficient.
probably not exactly how steam does it, or even close, but as a quick & dirty comparison: compressed and decompressed a random CD.iso (~375 MB) I had laying about, using zstd and lzma, using 1MB dictitionary:
test system: Arch linux (btw, as is customary) laptop with AMD Ryzen 7 PRO 7840U cpu.
used commands & results:
Zstd:
# compress (--maxdict 1048576 - sets the used compression dictionary to 1MB) : % time zstd --maxdict 1048576 < DISC.ISO > DISC.zstd zstd --maxdict 1048576 < DISC.ISO > DISC.zstd 1,83s user 0,42s system 120% cpu 1,873 total # decompress: % time zstd -d < DISC.zstd > /dev/null zstd -d < DISC.zstd > /dev/null 0,36s user 0,08s system 121% cpu 0,362 total
So, pretty quick all around.
Lzma:
# compress (the -1e argument implies setting preset which uses 1MB dictionary size): % time lzma -1e < DISC.ISO > DISC.lzma lzma -1e < DISC.ISO > DISC.lzma 172,65s user 0,91s system 98% cpu 2:56,16 total #decompress: % time lzma -d < DISC.lzma > /dev/null lzma -d < DISC.lzma > /dev/null 4,37s user 0,08s system 98% cpu 4,493 total
This one felt like forever to compress.
So, my takeaway here is that the time cost to compress is enough to waste a bit of disk space for sake of speed.
and lastly, just because I was curious, ran zstd on max compression settings too:
~11s compression time, ~0.5s decompression, archive size was ~211 MB.
deemed it wasn’t nescessary to spend time to compress the archive with lzma’s max settings.
Now I’ll be taking notes when people start correcting me & explaining why these “benchmarks” are wrong :P
edit:
goofed a bit with the max compression settings, added the same dictionary size.
edit 2: one of the reasons for the change might be syncing files between their servers. IIRC zstd can be compressed to be “rsync compatible”, allowing partial file syncs instead of syncing entire file, saving in bandwidth. Not sure if lzma does the same.