Understanding Phishing and Malware Protection in Google Chrome

Friday, November 14, 2008

Google Chrome includes features to help protect users against phishing and malware attacks. If you have ever hit a red page with the title "Warning: Visiting this site may harm your computer!" (such as our test page) or "Warning: Suspected phishing site!" then you have already seen these features in action. While we try to provide an explanation of what's happening on that warning page, a number of people have asked for more information about how this feature works, in terms of where the data behind those warnings come from, how that data gets to the computer, and what privacy implications the feature has.

Where does the phishing and malware data come from?

Google is constantly crawling and re-crawling the web, all the while finding new and changed websites. These websites are found by following links from other websites, crawling URLs submitted by webmasters and users, and so forth. Sometimes, during that process, we discover a website where something doesn't seem right. A website may look like a phishing website, designed to steal your personal information, or it may contain signs of potentially malicious activity that would install malware onto your computer without your consent. If we find a website that looks like it's a phishing page, it gets added to a list of suspected phishing websites. If we find a website that contains signs of potentially malicious activity, we start up a virtual machine, browse to that website, and watch what happens. If we see certain activities happen on that virtual machines (such as viruses being downloaded and installed), we add that website to a list of suspected malware-infected websites. The process for discovering suspected malware-infected websites is described in more detail in a paper written by Niels Provos and colleagues from Google's anti-malware team.

How does this data get to my computer? 

If you have phishing and malware protection enabled, then Google Chrome will contact servers at Google within five minutes of startup, and approximately every half hour thereafter, to download updated lists of suspected phishing and malware websites. These lists are then stored on your computer, so that as you browse the web, each page can be checked against the list of suspected phishing and malware websites locally, without sending the address of each webpage you visit to Google. This is designed to offer both performance (by not having to wait on a round-trip request to Google's servers) and privacy (by not sending a record of your browsing session to Google).

As the lists are large (hundreds of thousands of entries), we looked for ways to reduce the amount of information that had to be sent to and stored on users' computers, to reduce the amount of bandwidth and storage space consumed. One way we achieve this is by using partial hashes of URLs in the lists downloaded by the computer. What this means is that rather sending down the full URL of each website, we do the following. First, we hash the URL using SHA-256. Then, we send add the first 32 bits of that 256-bit hash into the list of phishing or malware websites. Those lists of 32-bit hash prefixes are then downloaded by Google Chrome in the background as described earlier. 

How is this data used, and what is sent back to Google?

When you browse the web using Google Chrome, the hash of each URL is computed, and the first 32 bits of that URL's hash is compared against the list of suspected phishing and malware websites. This includes the URL of the website you are visiting, as well as the URL of any included resources (such as included JavaScript or Adobe Flash movies). If the first 32 bits of the hash match an entry in the list, it is likely that the URL is on the list of suspected phishing or malware websites. At this point, we can only say likely, because there is still a reasonable chance of hash collisions in the 32-bit space - two distinct URLs with distinct 256-bit hashes where the first 32 bits of those hashes are the same. To confirm that the URL is suspected as a phishing or malware website, and not just a 32-bit hash collision, the 32-bit hash is sent to Google. Google then returns the full 256-bit hashes suspected of being phishing or malware and starting with those 32 bits. The full 256-bit hash of the URL in question can then be compared against the 256-bit hash(es) returned by Google, to make a determination of whether in fact the URL in question is or is not on the list of suspected phishing or malware websites. Using this scheme, Google Chrome is able to quickly check the website and its resources against a local database, and only sends information back to Google when the site matches an entry on the locally stored lists. In the case where information is sent to Google to verify such a suspicion, that information consists only of a part of the hash of a URL, not the URL itself. As such, Google never gets information that would definitively indicate whether a user has visited a particular website or not. The end result is a low-overhead efficient mechanism to help protect against phishing and malware, while also helping to protect users' privacy.

19 comments:

Onekopaka said...

Firefox 3 and up also block the test page. As some may know, Firefox 3 and up uses the same API (Safebrowsing API, read more at http://code.google.com/apis/safebrowsing/) as Google Chrome to tell whether the site is a a phishing site. The API is available for anyone to use in their application (it's just SHA-256 hashes, so many programming languages are supported.) Thanks Google!.

Speaking Freely said...

Someone needs to monitor the developer discussion group pages
(http://code.google.com/chromium/)

These are the most recent posts showing up:


Naughty Angels, From Good to Sexy Girls Halloween Costume [link] ...
Hot Sexy Video Models Persian [link] ...
Sexy, Beautiful, and Hot Arab Women [link] ...

MK said...

Unlike normal downloads, the auto-downloading of phishing/malware hashes seems to cause long stretches of hard disk activity. Firefox 3 had this problem, too, for days after I first installed it. Firefox eventually stopped doing it, but with Chrome I got impatient and disabled the protection after a few days, since the periodic spates of grinding and slowdown got annoying. (There's been a few version updates since then, so, for all I know, it's been fixed. That would be nice.)

If constructing the database piecemeal is so resource-intensive, downloading the whole thing in a single preconstructed lump sounds less troublesome. Too bad that isn't an option, to my knowledge.

Iron Guts Morla said...

@mk: But the database is never final as new malicious sites appear every minute.

MK said...

@iron: But you can offer something reasonably up-to-date and reduce the amount of updating that has to happen afterward. That's how it works with antivirus software, right? Presumably there's a large bulk of old listings that are pretty settled.

(I'm assuming the heavy hard disk activity stops after you've mostly caught up, since Firefox stopped after a while. It's either that or they fixed whatever was wrong with the update process.)

Taty said...

I don't know if this is the right place to report something but, here I go. I'm loving my time with chrome... the biiig problem is regarding the close button. It must be like IE and ask if you really want to close it when u have more than one page opened in the same session.

thomas said...

i've been searching for awhile on how to fix the bug of not being able to scroll (up or down) on my HP Pavilion zv6000 laptop...any suggestions?? thanks!

RoyiAvital said...

You must do something with the Chrome Discussions Group.

It's flooded with spam.

Taty said...

thomas,

are u sure that the scroll is enabled in our notebook? in my case, the windows automatically changed the driver and I could not do it anymore...i just undo the driver uptade...

Darky said...

The thing I want to know most is, which part of the local anti-phishing/malware database updating process caused the severe hard disk thrashing in the early builds of Chrome/Chormium, and how did Google fix that? Anyone can go into the details? thanks.

RichB said...

Is there a list of which 'Good' sites have 32bit hashes which collide with 'Bad' sites? The owners of those domains should be notified their users will see a decrease in performance when the user visits their site.

Ahmad said...

Very nice explanation. Thanks.

Jason said...

On a side note, you can disable this feature with reassurance that you're still protected if you use OpenDNS (http://www.opendns.com).

OpenDNS protects its users from spyware/malware and phishing sites as well. It does so with help from the fellows at St. Bernard (iGuard, malware) and the list of phishing sites from PhishTank (owned and operated by the creators of OpenDNS).

Personally, I would rather rely on OpenDNS's level of security, as it doesn't require anything to be done on my side -- all the protection is server sided, so it's relatively simple to stay protected without downloading massive databases ;-)

Li said...

A lot of friends around me want everything from google toolbar in Chrome. If this has not been done, none of us would switch from firefox to Chrome? How can you imagine a google browser without google applications???? Hope you can solve this problem in the next beta version!!

Jason said...

@Li
That's a common misconception, you see: a lot of the Google toolbar features are actually built into Chrome, so there's no need for the toolbar.

The following features are features offered by the toolbar that are built into Chrome:

* Safe browsing
* AutoFill
* Word Find
* Highlight Search Terms
* SpellCheck
* Enhanced Search Box (via omni bar)

Those, in my opinion, are the only important features of the toolbar, so it makes sense that they are built into Chrome.

The other features in the toolbar are basically just automations/macros to help you navigate -- which can be added into Chrome later via plugins.

Kito said...

Google Chrome = Opera Browser..

Open your eyes, only the indepedent tabs control are diferent..

Opera is mutch mutch better!

JP said...

I don't know if this is the right place to comment but it would be nice if CTRL-clicking the HOME button produced a home page in a new tab.

Steve said...

Chrome performs a significant amount of disk activity at every startup. This activity can last anywhere from 30 seconds to several minutes. This heavy disk i/o is not experienced on three other browsers; IE, FF, or Opera. This is happening on a high end PC that has 10K RPM drives. After a single application startup from the other browsers, the XP system cache allows them to start up very quickly.

zillah975 said...

Well, Chrome let malware onto my computer, and I can't figure out how. I didn't visit a phishing site, just my usual livejournal and facebook sites, but shortly after logging on this morning I started getting popups from Antivirus Pro 2009. I think I know how to get rid of it, but I started doublechecking Chrome's settings. Malware/phishing protection is on, but under "Security" there's this: "You have chosen to open certain file types automatically after downloading. You can clear these settings so that downloaded files don't open automatically." But the "clear auto-opening settings" button is greyed out and I can't find anyplace else where the settings might be. A google search returned a thread on the support forum complaining about just that, but with no answers.

http://www.google.com/support/forum/p/Chrome/thread?tid=4d67be07c18033d8&hl=en

Now, I know I didn't set it to automatically open ANY downloads. I'm not crazy. So wth? And how do I change this?