8.16.0.4
5.5.3 String, Byte String, and Port Matching🔗ℹ
A regexp produces by rx matches in either character
mode or byte mode. The mode is inferred from the rx pattern.
For example, it a literal string is part of the pattern, then it must be
in character mode, but if a literal byte string is part of the pattern,
it must be in byte mode. The string and
bytes forms can be used to make the choice
explicit.
Either mode can work with a string input to match, and either can work
with a byte string input to match. In the case of string mode,
. and any match a
Unicode character, so given a byte string input, they match UTF-8
encoding sequences, only. Along similar lines, a byte-based regexp given
a string input matches against the UTF-8 encoding of the string.
|
RXMatch("abc", [], {}) |
|
RXMatch("λλλ", [], {}) |
|
RXMatch(Bytes.copy(#"abc"), [], {}) |
> byte_rx.match("λλλ") // six bytes in UTF-8 |
|
#false |
|
RXMatch(Bytes.copy(#"abc"), [], {}) |
> char_rx.match(#"a\xFF\xFF") // not valid UTF-8 |
|
#false |
> char_rx.match(#"\316\273\316\273\316\273") |
|
RXMatch(Bytes.copy(#"\316\273\316\273\316\273"), [], {}) |
A regexp match can be applied directly to an input port, as
opposed to reading bytes or strings from the port and then matching.
Direct port matching is especially useful with rx_in or
RX.match_in to find the first match, because bytes can be read
from the port lazily to find a match, and no further bytes will be
consumed after a match ends. A port is treated like a byte string for
input, so even if a character-based regexp is used, results are reported
in terms of bytes.
|
RXMatch(Bytes.copy(#"abc"), [], {}) |
|
RXMatch(Bytes.copy(#"def"), [], {}) |
|
#false |