We are approaching Maximum Compression

TL/DR

New compression standards and technologies are providing diminishing returns, signaling we are entering the long tail of space savings - the big gains of the past are coming to an end.

LONGER

DEF CON has limited budget and bandwidth, so it makes sense to try and get the most out of it. We also have a high tech community so we worry less about backwards compatibility. This lets us adopt features faster than a purely consumer site would that is focused on maximum compatibility.

During a recent round of web server upgrades I took the opportunity to max out these four compression options:
  1. Switch from static GZip to Brotli compression.
  2. Switch from dynamic GZip to Brotli compression.
  3. Standardize on .woff2 fonts.
  4. Start using .webp images by default instead of .jpg or .png
I'll go through the savings of these and why we are near the limit:
  • Back in 1993, the year of DEF CON 1, gzip was created. 20 years of additional compression knowledge later saw the creation of brotli in 2013. .gz has been the standard for static and dynamic HTTP compression since 1999 with the HTTP/1.1 standard.
  • Brotli is a newer compression algorithm designed by Google and released in 2013, and starting in 2016 through 2017 was adopted by all browsers as an additional compression option besides gzip.
Unlike gzip, brotli is only supported over HTTPS connections. If you still serve unencrypted port 80 traffic you must use the older gzip. Who still uses only port 80 traffic you say? Any site running Tor and providing a "hidden" .onion site. DEF CON serves an .onion of all our web sites so we have to support both gzip and brotli.

Compressed files can be served from a web server as either "static" or "dynamic" or both. DEF CON sites are mostly static, while forum.defcon.org is dynamic.

In static compression you pre-compress all the file types you want on the web server in advance, and the server sends those compressed versions to the user instead of the uncompressed (original) files. This provides the best space savings because you can compress at the maximum setting once and then just server the same file over and over. This works great for text based files like .css .rss .html .js etc.

#1 SWITCH FROM STATIC GZIP TO BROTLI
https://en.wikipedia.org/wiki/Brotli
https://github.com/google/brotli
https://medium.com/oyotech/how-brotli-compression-gave-us-37-latency-improvement-14d41e50fee4
http://mattmahoney.net/dc/text.html#2241

Static gzip vs. static brotli on www.defcon.org
Filename Uncompressed gzip -9 Savings brotli --best Savings Additional Savings
defconrss.xml 1,039,978 281,830 .270 (73%) 214,692 .206 (79.4) .761 (23.9%)
index.html 35,974 9,471 .263 (73.7%) 7,760 .215 (78.5%) .819 (18.1%)
dc-28-layout.css 26,675 5,385 .201 (79.9%) 4,712 .176 (82.4) .875 (12.5%)
TOTALS 1,102,672 296,686 .269 (73.1%) 227,164 .206 (79.4%) .765 (23.5%)
The 27 year old gzip at maximum compression (-9) produces a space savings of 73.1% over the original sizes of our three text files.
The 7 year old brotli at maximum compression (--best) produces a space savings of 79.4% or 6.3% better overall and an improvement of 23.5% over gzip.

CONCLUSION: Nearing Maximum Compression!

Use static compression everywhere you can! You get the best in file size savings which translates into less bandwidth usage. The static compression provides a smaller file size and no additional CPU resources when compared to dynamic compression.

The 20 year newer brotli produced an additional 6% of overall space saving for the types of files we serve. Great. But in the 27 years since gzip was released we have gotten a 6% improvement. If we manage to pull off an additional 6% improvement in the next 20 years that would be considered a great achievement. Lots of compute and larger dictionaries may do this, but not anytime soon.

From the Cloudflare article: "On average, Brotli at the maximal quality setting produces 1.19X smaller results than zlib at the maximal quality. For files smaller than 1KB the result is 1.38X smaller on average"


#2 SWITCH FROM DYNAMIC GZIP TO BROTLI
https://blog.cloudflare.com/results-experimenting-brotli/
https://medium.com/oyotech/how-brotli-compression-gave-us-37-latency-improvement-14d41e50fee4
https://tools.ietf.org/html/rfc2616

The fastest option is always serving static compressed data. If that isn't possible, such as on forum.defcon.org or dynamically created directory listings on https://infocon.org/ then dynamic compression is for you.

The added complication is balancing how much compute time do you put into compressing the smallest file before sending it? The more traffic you have the more CPU time this will take, a lightly loaded server can spend more time crunching than a busy one. This article from Cloudflare goes into the details of balancing the time to compress vs. the additional latency it adds to the network connection. In the end they see the largest improvements for this trade off with 64KB files at brotli level 5.

CONCLUSION: Nearing Maximum Compression!

You can't achieve the compression levels of static, and the trade-offs of better compression vs. time are really stark. Brotli will become optimized over time so you will want to revisit your compression level calculations when you upgrade brotli. Short of dedicated compression hardware and a new standard it is unrealistic to see large improvements in dynamic compression times any time soon.


#3 STANDARDIZE ON .WOFF2 FONTS
https://en.wikipedia.org/wiki/TrueType
https://en.wikipedia.org/wiki/Web_Open_Font_Format
https://en.wikipedia.org/wiki/Embedded_OpenType

Font formats have evolved over the years, but there is really only one true winner, WOFF2. Every browser supports it, and the savings are fantastic. The savings from .woff -> .woff2 comes from the use of brotli.

ttf = 1991
eot = 2008 uses LZ compression
woff = 2010 uses zlib compression
woff2 = 2018 uses brotli compression
FONT NAME SIZE SAVINGS ADDITIONAL SAVINGS
GrandHotel.svg 142,577
GrandHotel.ttf 81,732 .573 (42.7%) Smaller than .svg
GrandHotel.woff 39,844 .279 (72.1%) Smaller than .svg 51.3% smaller than ttf
GrandHotel.eot 35,181 .246 (75.4%) Smaller than .svg 11.8% smaller than woff
GrandHotel.woff2 31,688 .222 (77.8%) Smaller than .svg 10.0% smaller than eot
CONCLUSION: Nearing Maximum Compression!

Use only .woff2 fonts. Between 2008 and 2018 lossless font sizes shrank by approxamately 20%. Is it realiztic to think in the next 10 years another 20% savings will occur? Maybe, but I doubt it. Font sizes have been tied to compression algorythms so unless there is a major lossless breakthough font sizes should bottom out around 28k or so in the next 10+ years.

#4 START USING .WEBP IMAGES INSTEAD OF .JPG OR .PNG
https://en.wikipedia.org/wiki/WebP
https://en.wikipedia.org/wiki/APNG
https://insanelab.com/blog/web-development/webp-web-design-vs-jpeg-gif-png/
https://www.keycdn.com/support/jpg-to-webp

gif = 1987 (lossless)
jpg = 1992 (lossy)
png = 1996 (lossless)
webp = 2010 (lossy / lossless)

ALERT! With the pending launch of iOS 14 and OSX 11 Safari, about 20% of the browser space, will FINALLY support webp, ending a long, painful, double workflow of having to support .jpg or .png for Apple users while the rest of the world used .webp. Much like the story of brotli, webp starts with Google in 2010 and really became usable in 2012 with libwep version 0.2.0. It is one of the "next-gen image formats" like jpeg-2000 or jpeg-xr, but webp has the best support and seems to be the winner.

There are a lot of good articles talking about different file formats. Reading the two listed above you come to the conclusion that webp generates 25-35% smaller size than .jpg and 25% smaller file sizes than .png on average.

CONCLUSION: Nearing Maximum Compression!

Lossless image file sizes are about at small as they are going to get for web display. In the 24 years since the lossless png format was created and the webp of today there is about a 25% space savings. In the lossy space of over 28 years between .jpg and .webp of today there is a 25-35% saving. With that in mind I do not expect to see large lossless image file size reductions in the next 10+ years. There is only so much compression and lookup tables you can do and remain lossless. The real possible gains over the next decade are going to be in lossy compression.

Got a web site? Start trans-coding to webp or start saving webp files from your original images. You will save space and gain speed across all major browsers in a way that hasn't been possible until now. Well, until Apple releases the new Safari. 😎

So that's it. Upgrade to the latest standards and know you are getting the most of your bandwidth and filesystem space usage. Five or 10 years from now chances are you will be still using these technologies or seeing only a very minor improvement. Such is math.

The Dark Tangent