On this page:
#%literal
#%juxtapose
+  +
#%call
||
#%parens
#%brackets
#%index
*
+
?
#%comp
.
any
char
byte
.*
.?
^
bof
$
eof
~~
lookahead
lookback
!
word_  boundary
word_  continue
if
cut
bytes
string
case_  sensitive
case_  insensitive
enable_  newline
disable_  newline
alpha
upper
lower
digit
xdigit
alnum
word
blank
newline
space
graph
print
cntrl
ascii
latin1
unicode.Ll
unicode.Lu
unicode.Lt
unicode.Lm
unicode.Lx
unicode.Lo
unicode.L
unicode.Nd
unicode.Nl
unicode.No
unicode.N
unicode.Ps
unicode.Pe
unicode.Pi
unicode.Pf
unicode.Pc
unicode.Pd
unicode.Po
unicode.P
unicode.Mn
unicode.Mc
unicode.Me
unicode.M
unicode.Sc
unicode.Sk
unicode.Sm
unicode.So
unicode.S
unicode.Zl
unicode.Zp
unicode.Zs
unicode.Z
unicode.Cc
unicode.Cf
unicode.Cs
unicode.Cn
unicode.Co
unicode.C
8.14.0.2
9.1.1 Regexp Patterns🔗ℹ

The portion of a rx or rx_in form within '' is a pattern that is written with regexp pattern operators. Some pattern operators overlap with expression operators, but they have different meanings and precedence in a pattern. For example, the pattern operator * creates a repetition pattern, instead of multiplying like the expression * operator.

regexp operator

#%literal string

 

regexp operator

#%literal bytes

A literal string or byte string can be used as a pattern. It matches the string’s characters or bytes literally. See also case_insensitive.

> rx'"hello"'.match("hello")

RXMatch("hello", [], {})

> rx'"hello"'.match("olleh")

#false

> rx'#"a"'.match(#"a")

RXMatch(Bytes.copy(#"a"), [], {})

regexp operator

pat #%juxtapose pat

 

regexp operator

pat ++ pat

 

regexp operator

pat #%call (pat)

Patterns that are adjacent in a larger pattern match in sequence. The ++ operator can be used to make sequencing explicit. An implicit #%call form is treated like #%juxtapose, consistent with implicit uses of parentheses for grouping as handled by #%parens.

> rx'"hello" " " "world"'.match("hello world")

RXMatch("hello world", [], {})

> rx'"hello" ++ " " ++ "world"'.match("hello world")

RXMatch("hello world", [], {})

> rx'"hello"

       ++ " "

       ++ "world"'.match("hello world")

RXMatch("hello world", [], {})

regexp operator

pat || pat

Matches as either the first pat or second pat. The first pat is tried first.

> rx'"a" || "b"'.match("a")

RXMatch("a", [], {})

> rx'"a" || "b"'.match("b")

RXMatch("b", [], {})

> rx'"a" || "b"'.match("c")

#false

regexp operator

#%parens (pat)

A parenthesized pattern is equivalent to the pat inside the parentheses. That is, parentheses are just for grouping and resolving precedence mismatches. See $ for inforation about capture groups, which are not implicitly created by parentheses (as they are in some traditional regexp languages).

> rx'"a" || "b" ++ "c"'.match("ac")

#false

> rx'("a" || "b") ++ "c"'.match("ac")

RXMatch("ac", [], {})

regexp operator

#%brackets [charset]

 

regexp operator

pat #%index [charset]

A [] pattern, which is an implicit use of #%brackets, matches a single character or byte, where charset determines the matching characters or bytes. An implicit #%index form (see Implicit Forms) is treated as a sequence of a pat and #%brackets.

See Regexp Character Sets for character set forms that can be used in charset.

> rx'["a"-"z"]'.match("m")

RXMatch("m", [], {})

> rx'["a"-"z"]'.match("0")

#false

regexp operator

pat *

 

regexp operator

pat * mode

 

mode

 = 

~greedy

 | 

~nongreedy

 | 

~possessive

Matches a sequence of 0 or more matches to pat.

> rx'any*'.match("abc")

RXMatch("abc", [], {})

> rx'any*'.match("")

RXMatch("", [], {})

By default, the match uses ~greedy mode, where a larger number of matches is tried first—but subsequent patterns may cause backtracking to a shorter match. In ~nongreedy mode, shorter matches are tried first. The ~possessive mode is like ~greedy, but without backtracking (i.e., the longest match must succeed overall for the enclosing pattern); see also cut.

> rx'($head: any*) ($tail: any*)'.match("abc")

RXMatch("abc", ["abc", ""], {#'head: 1, #'tail: 2})

> rx'($head: any* ~nongreedy) ($tail: any*)'.match("abc")

RXMatch("abc", ["", "abc"], {#'head: 1, #'tail: 2})

> rx'any* ~greedy "z"'.match("abcz")

RXMatch("abcz", [], {})

> rx'any* ~possessive "z"'.match("abcz")

#false

regexp operator

pat +

 

regexp operator

pat + mode

Like *, but matches 1 or more instances of pat.

> rx'any+'.match("abc")

RXMatch("abc", [], {})

> rx'any+'.match("")

#false

regexp operator

pat ?

 

regexp operator

pat ? mode

Similar to *, but matches 0 or 1 instances of pat.

> rx'any?'.match("a")

RXMatch("a", [], {})

> rx'any?'.match("")

RXMatch("", [], {})

> rx'any?'.match("abc")

#false

regexp operator

pat #%comp {count}

 

regexp operator

pat #%comp {min ..}

 

regexp operator

pat #%comp {min .. max}

Using {} after a pattern, which is use of the implicit #%comp form, specifies a repetition like * or + more generally. If a single count is provided, it specifies an exact number of repetitions. If just min is provided, then it specifies a minimum number of repetitions, and there is no maximum. Finally, min and max both can be specified. A count, min, or max must be a literal nonnegative integer.

> rx'any{2}'.match("aa")

RXMatch("aa", [], {})

> rx'any{2}'.match("aaa")

#false

> rx'any{2..}'.match("aa")

RXMatch("aa", [], {})

> rx'any{2..}'.match("aaa")

RXMatch("aaa", [], {})

> rx'any{2..3}'.match("aa")

RXMatch("aa", [], {})

> rx'any{2..3}'.match("aaa")

RXMatch("aaa", [], {})

> rx'any{2..3}'.match("aaaa")

#false

regexp operator

.

 

regexp operator

any

 

regexp operator

char

 

regexp operator

byte

Matches a single character or byte. The . or any patterns are equivalent, and they do not match a newline character unless they are used under enable_newline. The char and byte patterns match any character or byte, including a newline, and also imply that that the enclosing regexp matches strings or byte strings, respectively.

> rx'.'.match("a")

RXMatch("a", [], {})

> rx'.'.match("\n")

#false

> rx'enable_newline: .'.match("\n")

RXMatch("\n", [], {})

> rx'char'.match("\n")

RXMatch("\n", [], {})

> rx'byte'.match("\n")

RXMatch(Bytes.copy(#"\n"), [], {})

regexp operator

.*

 

regexp operator

.* mode

 

regexp operator

.?

 

regexp operator

.? mode

Equivalent to . * and . ?, but allowing the space between the operators to be omitted.

> rx'.*'.match("abc")

RXMatch("abc", [], {})

regexp operator

^

 

regexp operator

bof

Matches the start of input or, in the case of ^ when not under enable_newline, the position after a newline. The bof operator always matches the beginning of input and is not affected by enable_newline.

A regexp created with rx (as opposed to rx_in is implicitly prefixed with bof for use with methods like Regexp.match (as opposed to Regexp.match_in).

> rx'^ "a"'.match_in("a")

RXMatch("a", [], {})

> rx'^ "a"'.match_in("xa")

#false

> rx'^ "a"'.match_in("x\na")

RXMatch("a", [], {})

> rx'bof "a"'.match_in("x\na")

#false

> rx'enable_newline: ^ "a"'.match_in("x\na")

#false

regexp operator

$

 

regexp operator

eof

 

regexp operator

$ identifier: pat

 

regexp operator

$ identifier

 

regexp operator

$ int

 

regexp operator

$ expr

The $ operator is overloaded for multiple uses:

regexp operator

~~ pat

Matches pat as an unnamed capture group. The capture group’s match can only be referenced by index (counting from 1).

> rx'any ~~any any*'.match("abc")[1]

"b"

> rx'any ~~any $1'.match("abb")

RXMatch("abb", ["b"], {})

regexp operator

lookahead(pat)

 

regexp operator

lookback(pat)

 

regexp operator

! lookahead(pat)

 

regexp operator

! lookback(pat)

Matches an empty position in the input where the subsequent (for lookahead) or preceding (for lookback) input matches pator does not match, when a ! prefix is used.

> rx'. "a" lookahead("p")'.match_in("cat nap")

RXMatch("na", [], {})

> rx'. "a" !lookahead("t")'.match_in("cat nap")

RXMatch("na", [], {})

> rx'lookback("n") "a" .'.match_in("cat nap")

RXMatch("ap", [], {})

> rx'!lookback("c") "a" .'.match_in("cat nap")

RXMatch("ap", [], {})

regexp operator

word_boundary

 

regexp operator

word_continue

Matches an empty position in the input. The word_boundary pattern matches between an alphanumeric ASCII character (a-z, A-A, or 0-9) or _ and another character that is not alphanemeric ot _. The word_continue pattern matches positions that do not match word_boundary.

> rx'any+ ~nongreedy word_boundary'.match_in("cat nap")

RXMatch("cat", [], {})

> rx'any+ ~nongreedy word_continue'.match_in("cat nap")

RXMatch("c", [], {})

regexp operator

if lookahead(pat) | then_pat | else_pat

 

regexp operator

if lookback(pat) | then_pat | else_pat

 

regexp operator

if ! lookahead(pat) | then_pat | else_pat

 

regexp operator

if ! lookback(pat) | then_pat | else_pat

 

regexp operator

if $ identifier | then_pat | else_pat

 

regexp operator

if $ int | then_pat | else_pat

Matches as then_pat or else_pat, depending on the form immediately after if, which must be either a lookahead, lookback, or backreference pattern.

> rx'($x: "x")* if $x | "s" | "."'.match_in("xxxs")

RXMatch("xxxs", ["x"], {#'x: 1})

> rx'($x: "x")* if $x | "s" | "."'.match_in(".")

RXMatch(".", [#false], {#'x: 1})

regexp operator

cut

Matches an empty position in the input. The first potential match that reaches cut is the only one that is allowed to succeed. Note that a possessive repetition mode like * ~possessive is equivalent to using cut after the repetition.

In the case of a rx_in pattern or use of RX.match_in, cut applies only to a match attempt at a given input position. It does not prevent trying the match at a later position.

> rx'("ax" || "a") cut "x"'.match("ax")

#false

> rx'("a" || "ax") cut "x"'.match("ax")

RXMatch("ax", [], {})

regexp operator

bytes: pat

 

regexp operator

string: pat

Matches he same as pat, but specifies explicitly either byte-string mode or string mode.

> rx'string: "a"'.match("a")

RXMatch("a", [], {})

> rx'bytes: "a"'.match("a")

RXMatch(Bytes.copy(#"a"), [], {})

> rx'string: any'.match(#"\x80") // not UTF-8

#false

> rx'bytes: any'.match(#"\x80")

RXMatch(Bytes.copy(#"\200"), [], {})

regexp operator

case_sensitive: pat

 

regexp operator

case_insensitive: pat

Adjusts the treatment of literal strings and ranges in pat to match case-sensitive (the default) or case-insensitive. In case-insensitive mode, chacters are folded individually (as opposed for folding a string sequence, which can change its length).

> rx'"hello"'.match("HELLO")

#false

> rx'case_insensitive: "hello"'.match("HELLO")

RXMatch("HELLO", [], {})

regexp operator

enable_newline: pat

 

regexp operator

disable_newline: pat

Adjusts the meaning of any, ^, and $, in pat:

> rx'"x" any "y"'.match("x\ny")

#false

> rx'enable_newline: "x" any "y"'.match("x\ny")

RXMatch("x\ny", [], {})

> rx'^ "x" $'.match_in("a\nx\nz")

RXMatch("x", [], {})

> rx'enable_newline: ^ "x" $'.match_in("a\nx\nz")

#false

regexp operator

alpha

 

regexp operator

upper

 

regexp operator

lower

 

regexp operator

digit

 

regexp operator

xdigit

 

regexp operator

alnum

 

regexp operator

word

 

regexp operator

blank

 

regexp operator

newline

 

regexp operator

space

 

regexp operator

graph

 

regexp operator

print

 

regexp operator

cntrl

 

regexp operator

ascii

 

regexp operator

latin1

 

regexp operator

unicode.Ll

 

regexp operator

unicode.Lu

 

regexp operator

unicode.Lt

 

regexp operator

unicode.Lm

 

regexp operator

unicode.Lx

 

regexp operator

unicode.Lo

 

regexp operator

unicode.L

 

regexp operator

unicode.Nd

 

regexp operator

unicode.Nl

 

regexp operator

unicode.No

 

regexp operator

unicode.N

 

regexp operator

unicode.Ps

 

regexp operator

unicode.Pe

 

regexp operator

unicode.Pi

 

regexp operator

unicode.Pf

 

regexp operator

unicode.Pc

 

regexp operator

unicode.Pd

 

regexp operator

unicode.Po

 

regexp operator

unicode.P

 

regexp operator

unicode.Mn

 

regexp operator

unicode.Mc

 

regexp operator

unicode.Me

 

regexp operator

unicode.M

 

regexp operator

unicode.Sc

 

regexp operator

unicode.Sk

 

regexp operator

unicode.Sm

 

regexp operator

unicode.So

 

regexp operator

unicode.S

 

regexp operator

unicode.Zl

 

regexp operator

unicode.Zp

 

regexp operator

unicode.Zs

 

regexp operator

unicode.Z

 

regexp operator

unicode.Cc

 

regexp operator

unicode.Cf

 

regexp operator

unicode.Cs

 

regexp operator

unicode.Cn

 

regexp operator

unicode.Co

 

regexp operator

unicode.C

Each of these names is bound both as a character set and as a pattern that can be used directly, instead of wrapping in []. See the alpha, etc., character set for more information.

> rx'alpha'.match("m")

RXMatch("m", [], {})

> rx'alpha'.match("0")

#false