Thursday, April 26, 2012
Web browsers are big, complicated pieces of software that are extremely difficult to secure. In the case of Chrome, it’s an even more interesting challenge as we contend with a codebase that evolves at a blisteringly fast pace. All of this means that we need to move very quickly to keep up, and one of the ways we do so is with a scaled out fuzzing infrastructure.
Chrome’s fuzzing infrastructure (affectionately named "ClusterFuzz") is built on top of a cluster of several hundred virtual machines running approximately six-thousand simultaneous Chrome instances. ClusterFuzz automatically grabs the most current Chrome LKGR (Last Known Good Revision), and hammers away at it to the tune of around fifty-million test cases a day. That capacity has roughly quadrupled since the system’s inception, and we plan to quadruple it again over the next few weeks.
With that kind of volume, we’d be overloaded if we just automated the test case generation and crash detection. That’s why we’ve automated the entire fuzzing pipeline, including the following processes:
- Managing test cases and infrastructure - To run at maximum capacity we need to generate a constant stream of test cases, distribute them across thousands of Chrome instances running on hundreds of virtual machines, and track the results.
- Analyzing crashes - The only crashes we care about for security purposes are the exploitable ones. So we use Address Sanitizer to instrument our Chrome binaries and provide detailed reports on potentially exploitable crashes.
- Minimizing test cases - Fuzzer test cases are often very large files—usually as much as several hundred kilobytes each. So we take the generated test cases and distill them down to the few, essential pieces that actually trigger the crash.
- Identifying regressions - The first step in getting a crash fixed is figuring out where it is and who should fix it. So this phase tracks the crash down to the range of changes that introduced it.
- Verifying fixes - In order to verify when a crash is actually fixed, which we run the open crash cases against each new LKGR build.
In addition to manageability, this level of scale and automation provides a very important additional benefit. By aggressively tracking the Chrome LKGR builds, ClusterFuzz is evolving into a real-time security regression detection capability. To appreciate just what that means, consider that ClusterFuzz has detected 95 unique vulnerabilities since we brought it fully online at the end of last year. In that time, 44 of those vulnerabilities were identified and fixed before they ever had a chance to make it out to a stable release. As we further refine our process and increase our scale, we expect potential security regressions in stable releases to become increasingly less common.
Just like Chrome itself, our fuzzing work is constantly evolving and pushing the state of the art in both scale and techniques. In keeping with Chrome’s security principles, we’re helping to make the web safer by upstreaming the security fixes into projects we rely upon, like WebKit and FFmpeg. As we expand and improve ClusterFuzz, users of those upstream projects will continue to benefit.