Smaller is Faster (and Safer Too)

Wednesday, July 15, 2009

We have just started using a new compression algorithm called Courgette to make Google Chrome updates small.

We have built Google Chrome to address multiple factors that affect browser security. One of the pillars of our approach is to keep the software up to date, so we push out updates to Google Chrome fairly regularly. On the stable channel these are mainly security bug fixes, but the updates are more adventurous and numerous on developer channel.

It is an anathema to us to push out a whole new 10MB update to give you a ten line security fix. We want smaller updates because it narrows the window of vulnerability. If the update is a tenth of the size, we can push ten times as many per unit of bandwidth. We have enough users that this means more users will be protected earlier. A secondary benefit is that a smaller update will work better for users who don't have great connectivity.

Rather then push put a whole new 10MB update, we send out a diff that takes the previous version of Google Chrome and generates the new version. We tried several binary diff algorithms and have been using bsdiff up until now. We are big fans of bsdiff - it is small and worked better than anything else we tried.

But bsdiff was still producing diffs that were bigger than we felt were necessary. So we wrote a new diff algorithm that knows more about the kind of data we are pushing - large files containing compiled executables. Here are the sizes for the recent 190.1->190.4 update on the developer channel:
  • Full update: 10,385,920 bytes
  • bsdiff update: 704,512 bytes
  • Courgette update: 78,848 bytes
The small size in combination with Google Chrome's silent update means we can update as often as necessary to keep users safe.

More information on how Courgette works can be found here.

81 comments:

Joel Martinez said...

Impressive ... thanks for talking about this. Hopefully others will take note and implement similar mechanisms. This will only make the entire ecosystem better :-)

Drew said...

I love you Google.

Don-J-Rude said...

Great idea. Move all the (simple) pointer recalculations onto the client to perform after the diff of the "body" is applied by using a disassembler! The hinting also makes total sense, use some standard magic bytes to identify binary types and work in the decompressed format.
Memory usage analysis? CPU usage analysis? Perhaps the future paper will explain but it seems very CPU intensive (common trade-off).

Jan de Mooij said...

Very nice, this inventiveness is what i love about Google.

Could be very useful for other vendors (Mozilla etc) as well...

Philip Arnason said...

I'm continually impressed with google's strive for speed. I've recently started using chrome almost exclusively. It's unreal how long it takes for a new tab to open up in MSIE.

John said...

Have you looked into my binary delta algorithm, BDelta? The GPL'd source code is available at deltup.org, and my tests show that it is much faster, much less memory-hungry, and produces better deltas than bsdiff. The released code is old, but still very usable, and I have some updates in the pipeline. Just note that the output of the program is not compressed, so you'll want to pass it to a good compression tool like bzip2.

If you're interested, I'd be glad to help you with the implementation. If not, why not? ;-)

greenPhil said...

1. Versioning is very important in this case.

2. One diff can go to all the clients with same version.

3. You don't need to calculate diff for each installed chrome browser.

4. Just calculate the diff(say version x.y x.y+1) in the upgrade server and push it to all browsers that have x.y version.

This can work very well if there is no change in format files that can potentially be touched by the browser user.

Kent Fenwick said...

That is so cool, you read about that stuff in Algorithms but to see it in Practice is really cool.

Its the little things that make you so impressive.

Kent

Harold Fowler said...

Hey, if its safer I am all for it dude!

RT
http://www.anonymize.tk

qmx said...

Sweet

The bad part is the "silent" update.

jack.feed said...

Firefox needs to make their updates smaller like Chromium. Bravo!

rdza said...

This post makes "ChromeOS smartphone" ring in my head

B.D. said...

This post has been removed by the author.

BrainAds said...

still can't believe that chrome is only for windows users, how about fixing that in an update?

muharem said...

Please make the source code available under GPL (or similar) so other projects/people can use it.

Ralph said...

The Courgette algorithm sounds tricky to test and debug. Was it a long process to get it working properly?

Sankar said...

Isn't the same technique already available as delta-rpms ?

Jaco Pretorius said...

Sounds like a real pain to develop and debug - well done

Papa Marteau said...

Wondering why any vendor think about it before. Probably the diff between versions is difficult to maintain. Especially if your soft is not a web browser.

Guangrong86 said...

[url=http://www.brighttrans.com/en/index.php]Translation company[/url]
thanks for talking about this,hopefully others will take note and implement similar mechanisms.this will only make the entire ecosystem better.

ahsankhan said...

If you can sell a new car for 5000 then your dealer network will grow faster than you would believemotor trade recruitment

vincent b. said...

You seriously called it Courgette?
:-D

Fabio Turati said...

Is this the reason that last week there were many more updates (3) than usual? Does it mean we can expect more frequent updates from now on?

Anyway, great job guys, the compression you achieved is impressive.

Michel En La Red said...

Realmente Chrome es una super maquina, felicitaciones a Google, y a todos los colaboradores de este excelente proyecto..........

Art Vandalay said...

Impressive, although calling it a "compression algorithm" could be misleading for people who don't bother to read and understand the details. Such people might assume "oh, look, a stupendously better general algorithm for shrinking binary data", which is not at all the thrust of this clever idea for reducing the necessary size of binary diffs of object code.

famille reuell said...

Impressive.

Is it just that everyone else is lazy or stupid, or that Google simply refuses to not question to status quo?

Once again, simply well done and impressive.

Òscar said...

Will it be released as free software?

PWRadmin said...

Will you guys be open sourcing this amazing little tool!

Balaji said...

sounds cool . is it available for 3rd parties to use it???

Michel S. said...

@Sankar

Probably not. The deltarpm package that Fedora ships, for instance, does not even do binary diffing -- it relies on bzip2. With delta RPMs, the idea is that most files remain unchanged between two different versions, and you can just transfer the changes. Within each file, it's just normal compression.

Greg said...

go google!

Stephen said...

This post has been removed by the author.

Stephen said...

@John-of-BDelta
Email me at sra@chromium dot org so we can figure out how to get you some sample input files for an apples-to-apples comparison.

@muharem, @Òscar, @PWRadmin, @Balaji
Courgette is open source as part of the Chromium project. See http://code.google.com/chromium/terms.html

Nicolás said...

Excellent stuff. If only more companies followed your philosophy.

ssj4Gogeta said...

Another example which shows that Google is made of "geeks" and not "suits".

Ralph said...

You may like to know that Colin Percival, bsdiff's author, recently made a request that if BigCo uses some FLOSS they offer one of the developers some schwag, e.g. a T-shirt.

http://www.daemonology.net/blog/2009-07-14-a-call-for-schwag.html

His tarsnap "backup to the cloud" system is shaping up nicely by the way. http://www.tarsnap.com/

Jonathan said...

re: Michel S.
deltarpm does do a binary diff (using a modified bsdiff algorithm, with the diff stored using bzip2), but it also uses an algorithm described in http://www.daemonology.net/papers/thesis.pdf to deal with the same problem courgette is trying to address (though in a far more generic way). I would be very interested in generating a diff using the deltarpm algorithm and the courgette algorithm and see what their respective savings are.

Jan Kotek said...

You did not make new diff algorithm. It is ugly hack to the bsdiff for one very specific case.

Jeet said...

Google are Great Forever.... We Love Google....

conrad said...

I don't quite see how this is a new "compression" algo. You're making the diff's smarter and _that_ is what is making the patches so much smaller.

Sameer said...

Makes total sense given the more frequent nature of updates on the dev channel.

عارف بن باسوكي said...

Cool!! can't wait the opportunity to use this in my project!

Afriza N. Arief

cheguaka said...

Fedora has chosen XZ instead of bzip2 for rpm compresion. Worth to check?
http://tukaani.org/xz/

Kevin said...

This really is impressive - its innovation like this that makes chrome feel more cutting-edge than other browsers.

Eldad said...

Hello,
Chrome is GREAT, thanks very much, you are doing a great job guys (and gals)!
It is now my default browser, taking over from the excellent but slower Avant Browser. A couple suggestions: Please add (perhaps to "Control current page"): "Email page", "Email URL", "Email selected", and if possible "Print selected". (selected text and/or graphics).

Avant Browser has a great feature - Forward and Back pages by: Right click then Left mouse click to forward, Left click then Right click to back a page. It is a great feature that really speeds up surfing. I hope you can add that to Chrome.
Thanks again,
Eldad (Saugerties NY)

raju said...

this is a good hacl dr. adams.
please try to embed a scheme interpreter into the next chrome update.

ikpil said...

Good!! So, How to download the source code?

Christophe said...

Stephen,

I would love to hear your word on my proposal - please see here:
http://groups.google.com/group/chromium-discuss/browse_thread/thread/eb9ee72b32cadaef

I created a PDF explaining the process which can be downloaded here:
http://www.multimedial.de/share/BitPatternLibOverview.pdf

I am looking into doing a reference implementation in python of this.

I would very much appreciate any kind of input you got to this. So far, I came to the conclusion that this is a modified Hufmann encoding scheme with a pre-build and non-optimized dictionary.

Any input very much welcome,
Christophe Leske

vitorlazo said...

is the source code available somewhere? ca we see the implementation?

Wei said...

yeah, I realy want to see the source code...

DiSCo said...

This post has been removed by a blog administrator.

carizmali said...

This post has been removed by a blog administrator.

cilemsin42 said...

This post has been removed by a blog administrator.

hakan said...

This post has been removed by a blog administrator.

sohbetmerkezi said...

This post has been removed by a blog administrator.

sohbetmerkezi said...

This post has been removed by a blog administrator.

sohbetmerkezi said...

This post has been removed by a blog administrator.

sohbetmerkezi said...

This post has been removed by a blog administrator.

sohbetmerkezi said...

This post has been removed by a blog administrator.

Sinan said...

This post has been removed by a blog administrator.

Tek BeDDuamSin said...

This post has been removed by a blog administrator.

Tek BeDDuamSin said...

This post has been removed by a blog administrator.

Yasin said...

This post has been removed by a blog administrator.

kAriZmA - Www.SesliDesin.Com said...

This post has been removed by a blog administrator.

irfan said...

This post has been removed by a blog administrator.

seldamuratim said...

This post has been removed by a blog administrator.

magic100 said...

This post has been removed by a blog administrator.

tanju özveren said...

This post has been removed by a blog administrator.

eurodizi said...

This post has been removed by a blog administrator.

cezzaa said...

This post has been removed by a blog administrator.

ByReCeP said...

This post has been removed by a blog administrator.

sesliyes said...

This post has been removed by a blog administrator.

Medical Advisor said...

This post has been removed by a blog administrator.

metin said...

This post has been removed by a blog administrator.

kartal said...

This post has been removed by a blog administrator.

eray said...

This post has been removed by a blog administrator.

altan alır said...

This post has been removed by a blog administrator.

Kizyurdu said...

This post has been removed by a blog administrator.

Ralph Corderoy said...

Please delete spam comments, it devalues the blog to let them remain.

happy said...

This post has been removed by a blog administrator.

allgood said...

This post has been removed by a blog administrator.