Putting It to the Test

Friday, November 07, 2008

Historically, testing hasn't gotten much respect in the world of software development.  As the old saying goes, "It compiles! Ship it!" Only a joke — but like most jokes, it hides a grain of truth.

Not so for the Chromium project. Our philosophy is to test everything we possibly can, in as many ways as we can think of.

Test drive: why test?

It's easy to find arguments against testing. Writing tests takes time that developers could be using to write features, and keeping the test hardware and software infrastructure running smoothly isn't trivial.  (I'm one of the people largely responsible for the latter for Chromium, along with Nicolas Sylvain, so I know how time-consuming it can be.)  But in the long run, it's a big win, for at least two reasons.

A well-established set of tests that developers are expected to run before sending changes in makes it a lot easier to avoid causing problems, which lets other developers stay productive rather than chasing down regressions.  And testing submitted changes promptly keeps the code building cleanly and minimizes trouble in the longer term.

But even more importantly, an extensive set of automated tests gives us more confidence that Chromium is reliable, stable, and correct.  We're not afraid to rewrite major portions of the code, because verifying correctness afterward is easier. And we have the flexibility to iterate faster and produce releases more often, because we don't need a 6-month QA cycle before each one.

The test of time: performance testing

We run a lot of different tests. Tests of security. Tests of UI functionality. Tests of startup time, page-load speed, DOM manipulation, memory usage.  Tests for memory errors using Rational Purify. WebKit's suite of layout tests. Hundreds of unit tests to make sure that individual methods are still doing what they should. At last count, we run more than 9100 individual tests, typically 30-40 times every weekday.[1] You can find the full list in the developer documentation, but I'll talk more about one broad category here: performance testing.

With every change made in the tree, we keep track of Chromium's page-load time, memory usage, startup time, the time to open a new tab or switch to one, and more.  All these data points are available in graphs like this one:



Here the top, gold trace shows the startup time on XP for the tip-of-tree build; the green, bottom trace shows the startup time for a reference build so we can discount variation in the test conditions; and the blue, middle trace shows the startup time along a different code path that includes loading gears.dll. The light blue horizontal line is a reference marker. As you can see, whatever changed between the previous build and r3693, it increased the startup time (gold trace) by more than 8%. The developer responsible was able to see that and fix the problem a few builds later.

This graph also shows the usefulness of running a reference build. The spike in startup time that lasted only a single build also shows up in the reference-build time (the green trace). We can assume that it was something temporarily affecting the build machine, rather than a code change. (The problem must have cleared up by the time the Gears startup test ran.)

With so many performance graphs, it can be hard to watch them all, so there's also a summary page.

One final note about Chromium's performance graphs: they're written in HTML and JavaScript, and we're looking for someone to make them easier to use.  If you're interested, grab the code and start hacking!

Test bed: the Chromium buildbot

Nearly all of this testing is controlled by Chromium's buildbot, which automates the build/test cycle.  Every time a change is submitted, the buildbot master builds the tree, runs the tests on all the different platforms, and displays the results.  For a complete guide to the buildbot and its "waterfall" result page, see the Tour of the Chromium Buildbot in the developer docs.

Pro-test

Of course, once you have lots of tests running, the second important aspect of good tree hygiene is to keep them all passing.  But that's a subject for another post.

[1] It's hard to put a single number on it, because certain tests only apply to some parts of the code.  But however you count it, it's a lot of tests.

21 comments:

Simon said...

What's behind the moving absolute value of the reference build? If it is a fixed binary file and variations are machine-dependent, then graphs with t_ref subtracted could be useful?

Like this:
1) To save some bandwidth, load an empty graph: http://build.chromium.org/buildbot/perf/xp-release-dual-core/startup/report.html?history=-2

2) Open javascript console and run the following:

/* prepare for fishing out data */
var realPlotter=Plotter, revNrs, plotData, traceNames, plotter;
Plotter=function (revs_, data_, names_, x,y,z) { revNrs=revs_; plotData=data_; traceNames=names_; }

/* get data: */ params.history="500"; fetch_summary();

/* mangle data: */
var nyTraceNames = ['gears-t_ref', 't-t_ref'], nyPlotData=new Array(plotData[0],plotData[1]), nyMax=plotData[0].length;
for (var i=0; !(i>=nyMax); i++) { nyPlotData[0][i][0]-=plotData[2][i][0]; nyPlotData[1][i][0]-=plotData[2][i][0]; }
for (var i=0; !(i>=nyMax); i++) { nyPlotData[0][i][1]=0; nyPlotData[1][i][1]=0; }

/* plot: */ plotter = new realPlotter(revNrs, nyPlotData, nyTraceNames, units, document.getElementById("output"), true); plotter.onclick = on_clicked_plot;

/* bonus: */ function seeObject(o) { var t=''; for (var x in o) t+='\n'+x; return t; } seeObject(plotter);
plotter.plot(); /* and then you see it - some more filtering would be nice, if the overall trend is what we want to see */

MK said...

Since you mention tests of UI functionality... the placement of the Stop button on the right, far from the other navigation buttons, has been attributed to UI tests showing people rarely use Stop. However, when Stop is used, how often is it used in conjunction with the other navigation buttons? For example, if it's common for users to Reload right after Stopping (as I do), it would make sense to keep them close together instead of separating them.

Also, some new users can't even find the Stop button, since it's in a strange place and doesn't always show up. No other browser separates the Stop and Reload buttons, and I'm still skeptical there's a net usability gain from Chrome's doing so.

Unfortunately, Ben rejects every report on this issue by simply saying it's "by design", without actually addressing people's valid critiques of said design. I still hope for a less peremptory justification at some point.

dkgoodman said...

Stop the unethical testing of software!

Humor: http://www.annoyances.org/exec/show/article09-200

:)

Richard Heyes said...

@mk:

A compromise would be to leave it where it is (to satisfy the usability bods), but make it draggable (not an option). Then if you don't like it on the right, you could simply drag it to the left. Everyone's happy.

Darshan Chande said...

Testing is good. I am sure something new will come up off it. Google is great! Chome rocks!

Tzvetan Mikov said...

Testing continuously! Man, who would have thought of that? We learn something new every day. Thank you, Google!

Planned subject of next post: Compiler Warnings Matter.

(Seriously though, guys, you are doing a great job with Chromimum, so keep up the good work).

Lasse Koskela said...

Pamela, the heading "Test drive: why test?" makes me wonder if you're using TDD or just writing a lot of tests?

Fine Print. said...

This comment has been removed by the author.

Fine Print. said...

When are we going to be able to add RSS Feeds, like Flock does. When I had Flock, that feature was very useful to me. I hope you guys release that feature with the next beta update.

Tom said...

Continuing on from mk's comment, I think Safari's solution to this problem is a good one: In Safari, one of the buttons switches between the 'Stop' and 'Reload' actions depending on the state of the page (loading or loaded).

Many people don't like taking suggestions from others, particularly not rivals, but being able to admit one person's idea is better than your own is a key skill in engineering.

Olof Bjarnason said...

Pamela: I'd like to reiterate Lasse's question: With "Test Drive" in heading, and also the wording "test-driven design" in the Chrome Comic Strip (click here), I get the feeling you have used TDD (test driven development/design) to some extent when developing Google Chrome?

Richard Heyes said...

@Fine print:
Considering the competition have it, I'd be quite surprised if Chrome didn't support better handling of RSS at some point.

@Tom:
I'd say that a persons ability of listening to me tell them how to do things better is very much a life skil... :-)

Andy said...

It's all good and nice and everything, but when we'll get the Mac and Linux versions? For now I can only read about your browser, since I don't use Windows and I don't plan to.

Spacetech said...

In the new release notes (http://dev.chromium.org/getting-involved/dev-channel/release-notes)

"r4423 Adds a focus indicator for buttons when you use Windows Claassic theme. (Issue 135)"
You mean Classic :)

Richard Heyes said...

Keh? I'm using the windows classic theme, but still get the Vistaa style title bar. Am I missing something?

Pam said...

Chromium has used a combination of test-driven and other development processes. Sometimes we write tests first and then implement features to pass them, sometimes we use existing tests as a guide to what to work on next, and sometimes we implement first and test afterward. It depends on the subject at hand, and on the individual developer.

Pies said...

As to the Reload/Stop button suggestion: what about people who routinely double-click things? I've met at least three people who did that on daily basis.

I think the point is that Stop is so rarely used, it's mostly there because otherwise people can feel uneasy (not being able to stop page loading). It's like the I'm Feeling Lucky button in Google Search -- people don't use it, but they prefer it to be there.

BTW, I don't think you need to click Stop before clicking Reload -- the latter implies the former.

(Disclaimer: I'm not a Chromium developer, this is just how I see it as an experienced web developer and long-time usability enthusiast.)

Navrang said...

Why the parent windows get close when there are more than one tab open!!! I'm wondering!!! There must be some warning before closing more than one tab.

Regards
Navrang

Rick said...

The Stop button is on the right because that's where the Go button is... Those people who are likely to use a Stop button instead of escape are the ones that'll use the Go button instead of Enter

Connors said...

Chrome is mostly great but it really needs a warning when you press close and there are more than one tab open, ive had to go searching through my history for more than one lost page that i closed that way

Bay area shirts said...

nice post.Thanks to you.Hard Drive Recovery