Chromium Blog
News and developments from the open source browser project
Understanding Phishing and Malware Protection in Google Chrome
Friday, November 14, 2008
Google Chrome includes features to help protect users against phishing and malware attacks. If you have ever hit a red page with the title "Warning: Visiting this site may harm your computer!" (such as our
test page
) or "Warning: Suspected phishing site!" then you have already seen these features in action. While we try to provide an explanation of what's happening on that warning page, a number of people have asked for more information about how this feature works, in terms of where the data behind those warnings come from, how that data gets to the computer, and what privacy implications the feature has.
Where does the phishing and malware data come from?
Google is constantly crawling and re-crawling the web, all the while finding new and changed websites. These websites are found by following links from other websites, crawling URLs submitted by webmasters and users, and so forth. Sometimes, during that process, we discover a website where something doesn't seem right. A website may look like a phishing website, designed to steal your personal information, or it may contain signs of potentially malicious activity that would install malware onto your computer without your consent. If we find a website that looks like it's a phishing page, it gets added to a list of suspected phishing websites. If we find a website that contains signs of potentially malicious activity, we start up a virtual machine, browse to that website, and watch what happens. If we see certain activities happen on that virtual machines (such as viruses being downloaded and installed), we add that website to a list of suspected malware-infected websites. The process for discovering suspected malware-infected websites is described in more detail in a
paper
written by Niels Provos and colleagues from Google's anti-malware team.
How does this data get to my computer?
If you have phishing and malware protection
enabled
, then Google Chrome will contact servers at Google within five minutes of startup, and approximately every half hour thereafter, to download updated lists of suspected phishing and malware websites. These lists are then stored on your computer, so that as you browse the web, each page can be checked against the list of suspected phishing and malware websites locally, without sending the address of each webpage you visit to Google. This is designed to offer both performance (by not having to wait on a round-trip request to Google's servers) and privacy (by not sending a record of your browsing session to Google).
As the lists are large (hundreds of thousands of entries), we looked for ways to reduce the amount of information that had to be sent to and stored on users' computers, to reduce the amount of bandwidth and storage space consumed. One way we achieve this is by using partial hashes of URLs in the lists downloaded by the computer. What this means is that rather sending down the full URL of each website, we do the following. First, we hash the URL using SHA-256. Then, we send add the first 32 bits of that 256-bit hash into the list of phishing or malware websites. Those lists of 32-bit hash prefixes are then downloaded by Google Chrome in the background as described earlier.
How is this data used, and what is sent back to Google?
When you browse the web using Google Chrome, the hash of each URL is computed, and the first 32 bits of that URL's hash is compared against the list of suspected phishing and malware websites. This includes the URL of the website you are visiting, as well as the URL of any included resources (such as included JavaScript or Adobe Flash movies). If the first 32 bits of the hash match an entry in the list, it is likely that the URL is on the list of suspected phishing or malware websites. At this point, we can only say likely, because there is still a reasonable chance of hash collisions in the 32-bit space - two distinct URLs with distinct 256-bit hashes where the first 32 bits of those hashes are the same. To confirm that the URL is suspected as a phishing or malware website, and not just a 32-bit hash collision, the 32-bit hash is sent to Google. Google then returns the full 256-bit hashes suspected of being phishing or malware and starting with those 32 bits. The full 256-bit hash of the URL in question can then be compared against the 256-bit hash(es) returned by Google, to make a determination of whether in fact the URL in question is or is not on the list of suspected phishing or malware websites. Using this scheme, Google Chrome is able to quickly check the website and its resources against a local database, and only sends information back to Google when the site matches an entry on the locally stored lists. In the case where information is sent to Google to verify such a suspicion, that information consists only of a part of the hash of a URL, not the URL itself. As such, Google never gets information that would definitively indicate whether a user has visited a particular website or not. The end result is a low-overhead efficient mechanism to help protect against phishing and malware, while also helping to protect users' privacy.
Posted by Ian Fette, Product Manager
Labels
$200K
1
10th birthday
4
abusive ads
1
abusive notifications
2
accessibility
3
ad blockers
1
ad blocking
2
advanced capabilities
1
android
2
anti abuse
1
anti-deception
1
background periodic sync
1
badging
1
benchmarks
1
beta
83
better ads standards
1
billing
1
birthday
4
blink
2
browser
2
browser interoperability
1
bundles
1
capabilities
6
capable web
1
cds
1
cds18
2
cds2018
1
chrome
34
chrome 81
1
chrome 83
2
chrome 84
2
chrome ads
1
chrome apps
5
Chrome dev
1
chrome dev summit
1
chrome dev summit 2018
1
chrome dev summit 2019
1
chrome developer
1
Chrome Developer Center
1
chrome developer summit
1
chrome devtools
1
Chrome extension
1
chrome extensions
3
Chrome Frame
1
Chrome lite
1
Chrome on Android
2
chrome on ios
1
Chrome on Mac
1
Chrome OS
1
chrome privacy
4
chrome releases
1
chrome security
7
chrome web store
32
chromedevtools
1
chromeframe
3
chromeos
4
chromeos.dev
1
chromium
6
cloud print
1
coalition
1
coalition for better ads
1
contact picker
1
content indexing
1
cookies
1
core web vitals
2
csrf
1
css
1
cumulative layout shift
1
custom tabs
1
dart
8
dashboard
1
Data Saver
3
Data saver desktop extension
1
day 2
1
deceptive installation
1
declarative net request api
1
design
2
developer dashboard
1
Developer Program Policy
2
developer website
1
devtools
13
digital event
1
discoverability
1
DNS-over-HTTPS
4
DoH
4
emoji
1
emscriptem
1
enterprise
1
extensions
27
Fast badging
1
faster web
1
features
1
feedback
2
field data
1
first input delay
1
Follow
1
fonts
1
form controls
1
frameworks
1
fugu
2
fund
1
funding
1
gdd
1
google earth
1
google event
1
google io 2019
1
google web developer
1
googlechrome
12
harmful ads
1
html5
11
HTTP/3
1
HTTPS
3
iframes
1
images
1
incognito
1
insecure forms
1
intent to explain
1
ios
1
ios Chrome
1
jank
1
javascript
5
lab data
1
labelling
1
largest contentful paint
1
launch
1
lazy-loading
1
lighthouse
2
linux
2
Lite Mode
2
Lite pages
1
loading interventions
1
loading optimizations
1
long-tail
1
mac
1
manifest v3
2
metrics
2
microsoft edge
1
mixed forms
1
mobile
2
na
1
native client
8
native file system
1
New Features
5
notifications
1
octane
1
open web
4
origin trials
2
pagespeed insights
1
pagespeedinsights
1
passwords
1
payment handler
1
payment request
1
payments
2
performance
17
performance tools
1
permission UI
1
permissions
1
play store
1
portals
3
prefetching
1
privacy
2
privacy sandbox
4
private prefetch proxy
1
profile guided optimization
1
progressive web apps
2
Project Strobe
1
protection
1
pwa
1
QUIC
1
quieter permissions
1
releases
3
removals
1
rlz
1
root program
1
safe browsing
2
Secure DNS
2
security
36
site isolation
1
slow loading
1
sms receiver
1
spam policy
1
spdy
2
spectre
1
speed
3
ssl
2
store listing
1
strobe
2
subscription pages
1
suspicious site reporter extension
1
TCP
1
the fast and the curious
15
TLS
1
tools
1
tracing
1
transparency
1
trusted web activities
1
twa
2
user agent string
1
user data policy
1
v8
6
video
2
wasm
1
web
1
web apps
1
web assembly
1
web developers
1
web intents
1
web packaging
1
web payments
1
web platform
1
web request api
1
web vitals
1
web.dev
1
web.dev live
1
webapi
1
webassembly
1
webaudio
3
webgl
7
webkit
5
WebM
1
webmaster
1
webp
5
webrtc
6
websockets
5
webtiming
1
writable-files
1
yerba beuna center for the arts
1
Archive
2022
Dec
Sep
Aug
Jun
May
Apr
Mar
Feb
Jan
2021
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2020
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2019
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2018
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2017
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Feed
Follow @ChromiumDev
Give us feedback in our
Product Forums
.