On this page:
RX
RX.num_  captures
RX.capture_  names
RX.has_  backreference
RX.element
RX.match
RX.match_  in
RX.match_  range
RX.match_  range_  in
RX.is_  match
RX.is_  match_  in
RX.try_  match
RX.try_  match_  in
RX.matches
RX.split
RX.replace
RX.replace_  all
RX.max_  lookback
8.14.0.2
9.1.3 Regexp Objects🔗ℹ

class

class RX()

Represents a regexp as created with rx or rx_in. This class cannot be instantiated directly.

property

property (regexp :: RX).num_captures :: Int

 

property

property (regexp :: RX).capture_names :: Map

 

property

property (regexp :: RX).has_backreference :: Boolean

 

property

property (regexp :: RX).element :: (#'char || #'byte)

Properties of a regexp: its number of capture groups, a mapping of symbolic capture-group names to indices (counting from 1), whether the regexp is implemented with backreferences (which affects regexp splicing via $), and whether a match is in terms of characters or bytes.

method

method (regexp :: RX).match(input :: String || Bytes || Port.Input,

                            ~start: start :: Int = 0,

                            ~end: end :: maybe(Int) = #false,

                            ~input_prefix: input_prefix :: Bytes = #"",

                            ~unmatched_out: out :: maybe(Port.Output)

                                              = #false)

  :: maybe(RXMatch)

 

method

method (regexp :: RX).match_in(....) :: maybe(RXMatch)

 

method

method (regexp :: RX).match_range(....) :: maybe(RXMatch)

 

method

method (regexp :: RX).match_range_in(....) :: maybe(RXMatch)

 

method

method (regexp :: RX).is_match(....) :: Boolean

 

method

method (regexp :: RX).is_match_in(....) :: Boolean

Attempts to match a regular expression to input. For a regexp created with rx, the entire content (between start and end) must match for RX.match, while RX.match_in can match a portion of the input. For a regexp created with rx, both RX.match and RX.match_in can match a portion of the input.

The RX.match_range and RX.match_range_in methods are like RX.match and RX.match_in, but the resulting RXMatch object reports Range results instead of String or Bytes results. Range results are in terms of the start of the input, so if start is not 0, matching ranges will have only values of start and greater.

The RX.is_match and RX.is_match_in methods are like RX.match and RX.match_in, but report just a boolean instead of assembling a RXMatch value in the case of a match.

> rx'"a"'.match("a")

RXMatch("a", [], {})

> rx'"a"'.match("ab")

#false

> rx'"a"'.match_in("ab")

RXMatch("a", [], {})

> rx'"a"'.is_match("ab")

#false

> rx'"a"'.is_match_in("ab")

#true

The start and end arguments select a portion of the input to apply the match, where false for end corresponds to the end of input. The start and end positions correspond to characters for a string as input, and they correspond to bytes for a byte string or input port as input. Portions of input outside of that range are ignored. For example, bof matches the start offset of the full input.

> rx'"a"*'.match_in("a aa aaa", ~start: 2)

RXMatch("aa", [], {})

The input_prefix argument specifies bytes that effectively precede input for the purposes of ^ and other lookback matching. For example, a #"" prefix means that bof matches at the beginning of the input, while a #"\n" prefix means that a start-of-line ^ can match the beginning of the input, while a start-of-file bof cannot.

> rx'^ "a"*'.match_in("aaa")

RXMatch("aaa", [], {})

> rx'^ "a"*'.match_in("aaa", ~input_prefix: #"x")

#false

If out is provided as an output port for the ~unmatched_out argument, the part of input from its beginning (including before start) that precedes the match is written to the port. All input up to end is written to out if no match is found. This functionality is most useful when input is an input port.

> def out = Port.Output.open_string()

> rx'"a"+'.match_in("before aaa after", ~unmatched_out: out)

RXMatch("aaa", [], {})

> out.get_string()

"before "

method

method (regexp :: RX).try_match(input :: Port.Input,

                                ~start: start :: Int = 0,

                                ~end: end :: maybe(Int) = #false,

                                ~input_prefix: input_prefix :: Bytes = #"",

                                ~unmatched_out: out :: maybe(Port.Output)

                                                  = #false)

  :: maybe(RXMatch)

 

method

method (regexp :: RX).try_match_in(....) :: maybe(RXMatch)

Like RX.match and RX.match_in, but no bytes are consumed from input if the pattern does not match.

> def p = Port.Input.open_string("hello")

> rx'"hi"'.try_match(p)

#false

> p.peek_char()

#{#\h}

> rx'"hi"'.match(p)

#false

> p.peek_char()

Port.eof

method

method (regexp :: RX).matches(input :: String || Bytes || Port.Input,

                              ~start: start :: Int = 0,

                              ~end: end :: maybe(Int) = #false,

                              ~input_prefix: input_prefix :: Bytes = #"")

  :: List.of(String || Bytes)

 

method

method (regexp :: RX).split(input :: String || Bytes || Port.Input,

                            ~start: start :: Int = 0,

                            ~end: end :: maybe(Int) = #false,

                            ~input_prefix: input_prefix :: Bytes = #"")

  :: List.of(String || Bytes)

Like RX.match_in, but finding all non-overlapping matches. The RX.matches method returns the found matches, and RX.split returns the complement, i.e., the strings that are between matches. The result from RX.split will start or end with empty strings if the regexp matches the start or end of the input, respectively.

> rx'any ["abc"] any'.matches("xbx ycy")

["xbx", "ycy"]

> rx'any ["abc"] any'.matches(#"xbx ycy")

[Bytes.copy(#"xbx"), Bytes.copy(#"ycy")]

method

method (regexp :: RX).replace(

  input :: String || Bytes,

  insert :: (String || Bytes || Function.of_arity(1+num_captures)),

  ~input_prefix: input_prefix :: Bytes = #""

) :: String || Bytes

 

method

method (regexp :: RX).replace_all(....)

  :: String || Bytes

Like RX.match_in, but restricted to string and byte string inputs, and returning the input with the partial matches replaced by insert. The RX.replace method replaces only the first partial match, while RX.replace_all replaces all non-overlapping partial matches.

If insert is a string or byte string, then it is used in place of a match for the output. If insert is a function, then it receives at least one argument, plus an additional argument for each capture group in the regular expression; the result of calling input for each match is used as the replacement for the match.

> rx'any "x" any'.replace("extra text", "_")

"_ra text"

> rx'any "x" any'.replace_all("extra text", "_")

"_ra t_"

> rx'any "x" any'.replace("extra text", fun (s): "(" ++ s ++ ")")

"(ext)ra text"

> rx'any "x" any'.replace_all("extra text", fun (s): "(" ++ s ++ ")")

"(ext)ra t(ext)"

> rx'any "x" ($last: any)'.replace_all("extra text",

                                       fun (s, l): "(" ++ l ++ ")")

"(t)ra t(t)"

method

method (regexp :: RX).max_lookback()

Reports the maximum number of characters or bytes needed before the start of a match.

> rx'lookback("abc")'.max_lookback()

3

> rx'any lookback("abc")'.max_lookback()

2

> rx'any'.max_lookback()

0