5.5.4 Regexp Quick Reference
Regexp operators:
| string |
| Match literal string; see #%literal |
| bytes |
| Match literal byte string; see #%literal |
| pat pat |
| Match concatenation of matches; see #%juxtapose |
| pat ++ pat |
| Match concatenation of matches |
| (pat) |
| Same as pat |
| [charset] |
| Match any in charset; see #%brackets |
| pat || pat |
| Match either pat |
| pat * |
| Zero or more repetitions of pat |
| pat + |
| One or more repetitions of pat |
| pat ? |
| Zero or one matches of pat |
| pat {min .. max} |
| min to max-1 repetitions of pat; see #%comp |
| pat {min ..= max} |
| min to max repetitions of pat; see #%comp |
|
| Any non-newline character | |
|
| Any character | |
|
| Like any, but implies string mode | |
|
| Like any, but implies byte-string mode | |
|
| Beginning of input (i.e., “file”) | |
|
| End of input (i.e., “file”) | |
|
| Beginning of a line | |
|
| End of a line | |
| $ id: pat |
| Capture group, set id to match for pat |
| $ id |
| Backreference, match same as id |
| $ int |
| Backreference, match same as int |
| $ expr |
| Splice: match pattern produced by expr |
| ~~ pat |
| Anonymous capture group match pat |
| lookahead(pat) |
| Match empty if pat matches after |
| lookbehind(pat) |
| Match empty if pat matches before |
|
| Match empty if pat does not match after | |
| ! lookbehind(pat) |
| Match empty if pat does match before |
|
| Match empty outside of word | |
|
| Match empty within word | |
| if tst | pat | pat |
| Condition on lookahead, lookbehind, or backreference |
|
| Match empty, but limit backtracking | |
| string: pat |
| Specify string mode, match pat |
| bytes: pat |
| Specify byte-string mode, match pat |
| case_sensitive: pat |
| Match pat case-sensitively |
| case_insensitive: pat |
| Match pat case-insensitively |
Character sets that can be used directly as regexp operators:
|
| ASCII letters a-z and A-Z | |
|
| ASCII uppercase letters A-Z | |
|
| ASCII lowercase letters a-z | |
|
| ASCII digits 0-9 | |
|
| Hexadecimal digits 0-9, a-f, and A-F | |
|
| ||
|
| alnum plus _ | |
|
| Newline (ASCI 10) | |
|
| Space (ASCI 32) and tab (ASCI 7) | |
|
| Newline (10), return (13), space (32), tab (7), and form feed (12) | |
|
| ASCII characters that print with ink | |
|
| ||
|
| ASCII control character(ASCII 0 through 31) | |
|
| ASCII characters (ASCII 0 through 127) | |
|
| Latin-1 characters (Unicode 0 through 255) | |
|
| Character in a Unicode general category Ll | |
| unicode.cat |
| Character in other Unicode general category... |
Character sets that can be used directly as regexp operators:
| string |
| All characters in string |
| bytes |
| All bytes in bytes |
| (charset) |
| Same as charset |
| charset charset |
| Union of charsets; see #%juxtapose |
| charset - charset |
| Inclusive range between single-element charsets |
| charset || charset |
| Union of charsets |
| charset && charset |
| Intersection of charsets |
| charset -- charset |
| Difference of charsets |
| ! charset |
| Inverse of charset |
|
| All characters |