Used billions of times each day, the Chrome address bar (which we call the “omnibox”) is a powerful tool to make searching the web easier, whether you’re trying to quickly find your tabs or bookmarks, return to a web page you previously visited, or find information.
With the latest release of Chrome (M124), we’re integrating machine learning models to power the Chrome omnibox on desktop, so that web page suggestions are more precise and relevant to you. In the future, these models will also help improve the relevance scoring of search suggestions. Here’s a closer look at some of the important insights that help our team build this integration and where we hope the new model takes us.
As the engineering lead for the team responsible for the omnibox, every launch feels special, but this one is truly near and dear to my heart. When I first started working on the Chrome omnibox, I asked around for ideas on how we could make it better for users. The number one answer I heard was, "improve the scoring system." The issue wasn't that the scoring was bad. In fact, the omnibox often feels magical in its ability to surface the URL or query you want! The issue was that it was inflexible. A set of hand-built and hand-tuned formulas did the job well, but were difficult to improve or to adapt to new scenarios. As a result, the scoring system went largely untouched for a long time.
For most of that time, an ML-trained scoring model was the obvious path forward. But it took many false starts to finally get here. Our inability to tackle this challenge for so long was due to the difficulty of replacing the core mechanism of a feature used literally billions of times every day. Software engineering projects are sometimes described as "building the plane while flying it." This project felt more like "replacing all the seats in every plane in the world while they're all flying." The scale was enormous and the changes are felt directly by every user.
This ambitious undertaking would not have been possible without the work of such a talented and dedicated team. There were bumps in the road, walls we had to break through, and unanticipated issues that slowed us down, but the team was driven by a sincere belief in the impact of getting this right for our users.
One of the fun things about working with ML systems is that the training considers all the data at a scale that would be difficult to impossible for any individual person or team. And that can lead to surprising insights.
The coolest example of this phenomenon on this project was when we looked at the scoring curve of one particular signal: time since last navigation. The expectation with this signal is that the smaller it is (the more recently you've navigated to a particular URL), the bigger the contribution that signal should make towards a higher relevance score.
And that is, in fact, what the model learned. But when we looked closer, we noticed something surprising: when the time since navigation was very low (seconds instead of hours, days or weeks), the model was decreasing the relevance score. It turns out that the training data reflected a pattern where users sometimes navigate to a URL that was not what they really wanted and then immediately return to the Chrome omnibox and try again. In that case, the URL they just navigated to is almost certainly not what they want, so it should receive a low relevance score during this second attempt.
In retrospect, this is obvious. And if we had not launched ML scoring, we definitely would have added a new rule to the old system to reflect this scenario. But before the training system observed and learned from this pattern, it never occurred to anyone that this might be happening.
With the new ML models, we believe this will open up many new possibilities to improve the user experience by potentially incorporating new signals, like differentiating between time of the day to improve relevance. We want to explore training specialized versions of the model for particular environments: for example, mobile, enterprise or academic users, or perhaps different locales.
Additionally, we observe that the way users interact with the Chrome omnibox changes over time and we believe the relevance scoring should change with them. With the new scoring system, we can now simply collect fresher signals, re-train, evaluate, and deploy new models periodically over time.
By Justin Donnelly, Chrome software engineer
Cookies – small files created by sites you visit – are fundamental to the modern web. They make your online experience easier by saving browsing information, so that sites can do things like keep you signed in and remember your site preferences. Due to their powerful utility, cookies are also a lucrative target for attackers.
Many users across the web are victimized by cookie theft malware that gives attackers access to their web accounts. Operators of Malware-as-a-Service (MaaS) frequently use social engineering to spread cookie theft malware. These operators even convince users to bypass multiple warnings in order to land the malware on their device. The malware then typically exfiltrates all authentication cookies from browsers on the device to remote servers, enabling the attackers to curate and sell the compromised accounts. Cookie theft like this happens after login, so it bypasses two-factor authentication and any other login-time reputation checks. It’s also difficult to mitigate via anti-virus software since the stolen cookies continue to work even after the malware is detected and removed. And because of the way cookies and operating systems interact, primarily on desktop operating systems, Chrome and other browsers cannot protect them against malware that has the same level of access as the browser itself.
To address this problem, we’re prototyping a new web capability called Device Bound Session Credentials (DBSC) that will help keep users more secure against cookie theft. The project is being developed in the open at github.com/WICG/dbsc with the goal of becoming an open web standard.
By binding authentication sessions to the device, DBSC aims to disrupt the cookie theft industry since exfiltrating these cookies will no longer have any value. We think this will substantially reduce the success rate of cookie theft malware. Attackers would be forced to act locally on the device, which makes on-device detection and cleanup more effective, both for anti-virus software as well as for enterprise managed devices.
Learning from prior work, our goal is to build a technical solution that’s practical to deploy to all sites large and small, to foster industry support to ensure broad adoption, and to maintain user privacy.
At a high level, the DBSC API lets a server start a new session with a specific browser on a device. When the browser starts a new session, it creates a new public/private key pair locally on the device, and uses the operating system to safely store the private key in a way that makes it hard to export. Chrome will use facilities such as Trusted Platform Modules (TPMs) for key protection, which are becoming more commonplace and are required for Windows 11, and we are looking at supporting software-isolated solutions as well.
The API allows a server to associate a session with this public key, as a replacement or an augmentation to existing cookies, and verify proof-of-possession of the private key throughout the session lifetime. To make this feasible from a latency standpoint and to aid migrations of existing cookie-based solutions, DBSC uses these keys to maintain the freshness of short-lived cookies through a dedicated DBSC-defined endpoint on the website. This happens out-of-band from regular web traffic, reducing the changes needed to legacy websites and apps. This ensures the session is still on the same device, enforcing it at regular intervals set by the server. For current implementation details please see the public explainer.
Each session is backed by a unique key and DBSC does not enable sites to correlate keys from different sessions on the same device, to ensure there's no persistent user tracking added. The user can delete the created keys at any time by deleting site data in Chrome settings. The out-of-band refresh of short-term cookies is only performed if a user is actively using the session (e.g. browsing the website).
DBSC doesn’t leak any meaningful information about the device beyond the fact that the browser thinks it can offer some type of secure storage. The only information sent to the server is the per-session public key which the server uses to certify proof of key possession later.
We expect Chrome will initially support DBSC for roughly half of desktop users, based on the current hardware capabilities of users' machines. We are committed to developing this standard in a way that ensures it will not be abused to segment users based on client hardware. For example, we may consider supporting software keys for all users regardless of hardware capabilities. This would ensure that DBSC will not let servers differentiate between users based on hardware features or device state (i.e. if a device is Play Protect certified or not).
DBSC will be fully aligned with the phase-out of third-party cookies in Chrome. In third-party contexts, DBSC will have the same availability and/or segmentation that third-party cookies will, as set by user preferences and other factors. This is to make sure that DBSC does not become a new tracking vector once third-party cookies are phased out, while also ensuring that such cookies can be fully protected in the meantime. If the user completely opts out of cookies, third-party cookies, or cookies for a specific site, this will disable DBSC in those scenarios as well.
We are currently experimenting with a DBSC prototype to protect some Google Account users running Chrome Beta. This is an early initiative to gauge the reliability, feasibility, and the latency of the protocol on a complex site, while also providing meaningful protection to our users. When it’s deployed fully, consumers and enterprise users will get upgraded security for their Google accounts under the hood automatically. We are also working to enable this technology for our Google Workspace and Google Cloud customers to provide another layer of account security.
This prototype is integrated with the way Chrome and Google Accounts work together, but is validating and informing all aspects of the public API we want to build.
Many server providers, identity providers (IdPs) such as Okta, and browsers such as Microsoft Edge have expressed interest in DBSC as they want to secure their users against cookie theft. We are engaging with all interested parties to make sure we can present a standard that works for different kinds of websites in a privacy preserving way.
Development happens on GitHub and we have published an estimated timeline. This is where we will post announcements and updates to the expected timelines as needed. Our goal is to allow origin trials for all interested websites by the end of 2024. Please reach out if you'd like to get involved. We welcome feedback from all sources, either by opening a new issue or starting a discussion on GitHub.