Chromium Blog
News and developments from the open source browser project
Irregexp, Google Chrome's New Regexp Implementation
mercoledì 4 febbraio 2009
One of the new features in the most recent
dev-channel
release of Google Chrome (2.0.160.0) is Irregexp, a completely new implementation of
regular expressions
(regexps) in the V8 JavaScript engine. Irregexp builds on V8's existing infrastructure for memory management and native code generation and is tailored to work well for the kinds of regexps used by JavaScript programs on the web. The result is a considerable improvement in V8's regexp performance.
While the V8 team has been working hard to improve JavaScript performance, one part of the language that we have so far not given much attention is regexps. Our previous implementation was based on the widely used
PCRE
library developed by Philip Hazel at the University of Cambridge. The version we used, known as JSCRE, was adapted and improved by the WebKit project for use with JavaScript. Using JSCRE gave us a regular expression implementation that was compatible with industry standards and has served us well. However, as we've improved other parts of the language, regexps started to stand out as being slower than the rest. We felt it should be possible to improve performance by integrating with our existing infrastructure rather than using an external library. The SquirrelFish team is following a similar approach with their JavaScript engine.
A fundamental decision we made early in the design of Irregexp was that we would be willing to spend extra time compiling a regular expression if that would make running it faster. During compilation Irregexp first converts a regexp into an intermediate
automaton
representation. This is in many ways the "natural" and most accessible representation and makes it much easier to analyze and optimize the regexp. For instance, when compiling /Sun|Mon/ the automaton representation lets us recognize that both alternatives have an 'n' as their third character. We can quickly scan the input until we find an 'n' and then start to match the regexp two characters earlier. Irregexp looks up to four characters ahead and matches up to four characters at a time.
After optimization we generate native machine code which uses backtracking to try different alternatives. Backtracking can be time-consuming so we use optimizations to avoid as much of it as we can. There are
techniques
to avoid backtracking altogether but the nature of regexps in JavaScript makes it difficult to apply them in our case, though it is something we may implement in the future.
During development we have tested Irregexp against one million of the most popular webpages to ensure that the new implementation stays compatible with our previous implementation and the web. We have also used this data to create a new benchmark which is included in version 3 of the
V8 Benchmark Suite
. We feel this is a good reflection of what is found on the web.
If you want to try this out, and help us test it in the process, you can subscribe to the
dev-channel
and if you see problems that might be related to Irregexp consider filing a
bug
.
And BTW, we'll have sessions on V8 and other Chrome-related topics in May at
Google I/O
, Google's largest developer conference.
Posted by Erik Corry, Christian Plesner Hansen and Lasse Reichstein Holst Nielsen, Software Engineers
Etichette
$200K
1
10th birthday
4
abusive ads
1
abusive notifications
2
accessibility
3
ad blockers
1
ad blocking
2
advanced capabilities
1
android
2
anti abuse
1
anti-deception
1
background periodic sync
1
badging
1
benchmarks
1
beta
83
better ads standards
1
billing
1
birthday
4
blink
2
browser
2
browser interoperability
1
bundles
1
capabilities
6
capable web
1
cds
1
cds18
2
cds2018
1
chrome
35
chrome 81
1
chrome 83
2
chrome 84
2
chrome ads
1
chrome apps
5
Chrome dev
1
chrome dev summit
1
chrome dev summit 2018
1
chrome dev summit 2019
1
chrome developer
1
Chrome Developer Center
1
chrome developer summit
1
chrome devtools
1
Chrome extension
1
chrome extensions
3
Chrome Frame
1
Chrome lite
1
Chrome on Android
2
chrome on ios
1
Chrome on Mac
1
Chrome OS
1
chrome privacy
4
chrome releases
1
chrome security
10
chrome web store
32
chromedevtools
1
chromeframe
3
chromeos
4
chromeos.dev
1
chromium
9
cloud print
1
coalition
1
coalition for better ads
1
contact picker
1
content indexing
1
cookies
1
core web vitals
2
csrf
1
css
1
cumulative layout shift
1
custom tabs
1
dart
8
dashboard
1
Data Saver
3
Data saver desktop extension
1
day 2
1
deceptive installation
1
declarative net request api
1
design
2
developer dashboard
1
Developer Program Policy
2
developer website
1
devtools
13
digital event
1
discoverability
1
DNS-over-HTTPS
4
DoH
4
emoji
1
emscriptem
1
enterprise
1
extensions
27
Fast badging
1
faster web
1
features
1
feedback
2
field data
1
first input delay
1
Follow
1
fonts
1
form controls
1
frameworks
1
fugu
2
fund
1
funding
1
gdd
1
google earth
1
google event
1
google io 2019
1
google web developer
1
googlechrome
12
harmful ads
1
html5
11
HTTP/3
1
HTTPS
4
iframes
1
images
1
incognito
1
insecure forms
1
intent to explain
1
ios
1
ios Chrome
1
issue tracker
3
jank
1
javascript
5
lab data
1
labelling
1
largest contentful paint
1
launch
1
lazy-loading
1
lighthouse
2
linux
2
Lite Mode
2
Lite pages
1
loading interventions
1
loading optimizations
1
lock icon
1
long-tail
1
mac
1
manifest v3
2
metrics
2
microsoft edge
1
mixed forms
1
mobile
2
na
1
native client
8
native file system
1
New Features
5
notifications
1
octane
1
open web
4
origin trials
2
pagespeed insights
1
pagespeedinsights
1
passwords
1
payment handler
1
payment request
1
payments
2
performance
20
performance tools
1
permission UI
1
permissions
1
play store
1
portals
3
prefetching
1
privacy
2
privacy sandbox
4
private prefetch proxy
1
profile guided optimization
1
progressive web apps
2
Project Strobe
1
protection
1
pwa
1
QUIC
1
quieter permissions
1
releases
3
removals
1
rlz
1
root program
1
safe browsing
2
Secure DNS
2
security
36
site isolation
1
slow loading
1
sms receiver
1
spam policy
1
spdy
2
spectre
1
speed
4
ssl
2
store listing
1
strobe
2
subscription pages
1
suspicious site reporter extension
1
TCP
1
the fast and the curious
23
TLS
1
tools
1
tracing
1
transparency
1
trusted web activities
1
twa
2
user agent string
1
user data policy
1
v8
6
video
2
wasm
1
web
1
web apps
1
web assembly
2
web developers
1
web intents
1
web packaging
1
web payments
1
web platform
1
web request api
1
web vitals
1
web.dev
1
web.dev live
1
webapi
1
webassembly
1
webaudio
3
webgl
7
webkit
5
WebM
1
webmaster
1
webp
5
webrtc
6
websockets
5
webtiming
1
writable-files
1
yerba beuna center for the arts
1
Archive
2024
dic
ago
giu
mag
apr
mar
feb
2023
nov
ott
set
ago
giu
mag
apr
feb
2022
dic
set
ago
giu
mag
apr
mar
feb
gen
2021
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2020
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2019
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2018
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2017
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2016
dic
nov
ott
set
ago
giu
mag
apr
mar
feb
gen
2015
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2014
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2013
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2012
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2011
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2010
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2009
dic
nov
set
ago
lug
giu
mag
apr
mar
feb
gen
2008
dic
nov
ott
set
Feed
Follow @ChromiumDev
Give us feedback in our
Product Forums
.