программа для тестирования регулярных выражений, совместимых с Perl (a program for testing Perl-compatible regular expressions.)
DEFAULT OUTPUT FROM PCRETEST
This section describes the output when the normal matching
function, pcre[16|32]_exec()
, is being used.
When a match succeeds, pcretest
outputs the list of captured
substrings that pcre[16|32]_exec()
returns, starting with number
0 for the string that matched the whole pattern. Otherwise, it
outputs "No match" when the return is PCRE_ERROR_NOMATCH, and
"Partial match:" followed by the partially matching substring
when pcre[16|32]_exec()
returns PCRE_ERROR_PARTIAL. (Note that
this is the entire substring that was inspected during the
partial match; it may include characters before the actual match
start if a lookbehind assertion, \K, \b, or \B was involved.) For
any other return, pcretest
outputs the PCRE negative error number
and a short descriptive phrase. If the error is a failed UTF
string check, the offset of the start of the failing character
and the reason code are also output, provided that the size of
the output vector is at least two. Here is an example of an
interactive pcretest
run.
$ pcretest
PCRE version 8.13 2011-04-30
re> /^abc(\d+)/
data> abc123
0: abc123
1: 123
data> xyz
No match
Unset capturing substrings that are not followed by one that is
set are not returned by pcre[16|32]_exec()
, and are not shown by
pcretest
. In the following example, there are two capturing
substrings, but when the first data line is matched, the second,
unset substring is not shown. An "internal" unset substring is
shown as "<unset>", as for the second data line.
re> /(a)|(b)/
data> a
0: a
1: a
data> b
0: b
1: <unset>
2: b
If the strings contain any non-printing characters, they are
output as \xhh escapes if the value is less than 256 and UTF mode
is not set. Otherwise they are output as \x{hh...} escapes. See
below for the definition of non-printing characters. If the
pattern has the /+
modifier, the output for substring 0 is
followed by the the rest of the subject string, identified by
"0+" like this:
re> /cat/+
data> cataract
0: cat
0+ aract
If the pattern has the /g
or /G
modifier, the results of
successive matching attempts are output in sequence, like this:
re> /\Bi(\w\w)/g
data> Mississippi
0: iss
1: ss
0: iss
1: ss
0: ipp
1: pp
"No match" is output only if the first match attempt fails. Here
is an example of a failure message (the offset 4 that is
specified by \>4 is past the end of the subject string):
re> /xyz/
data> xyz\>4
Error -24 (bad offset value)
If any of the sequences \C
, \G
, or \L
are present in a data line
that is successfully matched, the substrings extracted by the
convenience functions are output with C, G, or L after the string
number instead of a colon. This is in addition to the normal full
list. The string length (that is, the return from the extraction
function) is given in parentheses after each string for \C
and
\G
.
Note that whereas patterns can be continued over several lines (a
plain ">" prompt is used for continuations), data lines may not.
However newlines can be included in data by means of the \n
escape (or \r, \r\n, etc., depending on the newline sequence
setting).