программа для тестирования регулярных выражений, совместимых с Perl (a program for testing Perl-compatible regular expressions.)
DATA LINES
Before each data line is passed to pcre[16|32]_exec()
, leading
and trailing white space is removed, and it is then scanned for \
escapes. Some of these are pretty esoteric features, intended for
checking out some of the more complicated features of PCRE. If
you are just testing "ordinary" regular expressions, you probably
don't need any of these. The following escapes are recognized:
\a alarm (BEL, \x07)
\b backspace (\x08)
\e escape (\x27)
\f form feed (\x0c)
\n newline (\x0a)
\qdd set the PCRE_MATCH_LIMIT limit to dd
(any number of digits)
\r carriage return (\x0d)
\t tab (\x09)
\v vertical tab (\x0b)
\nnn octal character (up to 3 octal digits); always
a byte unless > 255 in UTF-8 or 16-bit or 32-bit
mode
\o{dd...} octal character (any number of octal digits}
\xhh hexadecimal byte (up to 2 hex digits)
\x{hh...} hexadecimal character (any number of hex digits)
\A pass the PCRE_ANCHORED option to pcre[16|32]_exec()
or pcre[16|32]_dfa_exec()
\B pass the PCRE_NOTBOL option to pcre[16|32]_exec()
or pcre[16|32]_dfa_exec()
\Cdd call pcre[16|32]_copy_substring() for substring dd
after a successful match (number less than 32)
\Cname call pcre[16|32]_copy_named_substring() for
substring
"name" after a successful match (name termin-
ated by next non alphanumeric character)
\C+ show the current captured substrings at callout
time
\C- do not supply a callout function
\C!n return 1 instead of 0 when callout number n is
reached
\C!n!m return 1 instead of 0 when callout number n is
reached for the nth time
\C*n pass the number n (may be negative) as callout
data; this is used as the callout return value
\D use the pcre[16|32]_dfa_exec()
match function
\F only shortest match for pcre[16|32]_dfa_exec()
\Gdd call pcre[16|32]_get_substring() for substring dd
after a successful match (number less than 32)
\Gname call pcre[16|32]_get_named_substring() for substring
"name" after a successful match (name termin-
ated by next non-alphanumeric character)
\Jdd set up a JIT stack of dd kilobytes maximum (any
number of digits)
\L call pcre[16|32]_get_substringlist() after a
successful match
\M discover the minimum MATCH_LIMIT and
MATCH_LIMIT_RECURSION settings
\N pass the PCRE_NOTEMPTY option to pcre[16|32]_exec()
or pcre[16|32]_dfa_exec()
; if used twice, pass the
PCRE_NOTEMPTY_ATSTART option
\Odd set the size of the output vector passed to
pcre[16|32]_exec()
to dd (any number of digits)
\P pass the PCRE_PARTIAL_SOFT option to
pcre[16|32]_exec()
or pcre[16|32]_dfa_exec()
; if used twice, pass the
PCRE_PARTIAL_HARD option
\Qdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd
(any number of digits)
\R pass the PCRE_DFA_RESTART option to
pcre[16|32]_dfa_exec()
\S output details of memory get/free calls during
matching
\Y pass the PCRE_NO_START_OPTIMIZE option to
pcre[16|32]_exec()
or pcre[16|32]_dfa_exec()
\Z pass the PCRE_NOTEOL option to pcre[16|32]_exec()
or pcre[16|32]_dfa_exec()
\? pass the PCRE_NO_UTF[8|16|32]_CHECK option to
pcre[16|32]_exec()
or pcre[16|32]_dfa_exec()
\>dd start the match at offset dd (optional "-"; then
any number of digits); this sets the startoffset
argument for pcre[16|32]_exec()
or
pcre[16|32]_dfa_exec()
\<cr> pass the PCRE_NEWLINE_CR option to
pcre[16|32]_exec()
or pcre[16|32]_dfa_exec()
\<lf> pass the PCRE_NEWLINE_LF option to
pcre[16|32]_exec()
or pcre[16|32]_dfa_exec()
\<crlf> pass the PCRE_NEWLINE_CRLF option to
pcre[16|32]_exec()
or pcre[16|32]_dfa_exec()
\<anycrlf> pass the PCRE_NEWLINE_ANYCRLF option to
pcre[16|32]_exec()
or pcre[16|32]_dfa_exec()
\<any> pass the PCRE_NEWLINE_ANY option to
pcre[16|32]_exec()
or pcre[16|32]_dfa_exec()
The use of \x{hh...} is not dependent on the use of the /8
modifier on the pattern. It is recognized always. There may be
any number of hexadecimal digits inside the braces; invalid
values provoke error messages.
Note that \xhh specifies one byte rather than one character in
UTF-8 mode; this makes it possible to construct invalid UTF-8
sequences for testing purposes. On the other hand, \x{hh} is
interpreted as a UTF-8 character in UTF-8 mode, generating more
than one byte if the value is greater than 127. When testing the
8-bit library not in UTF-8 mode, \x{hh} generates one byte for
values less than 256, and causes an error for greater values.
In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This
makes it possible to construct invalid UTF-16 sequences for
testing purposes.
In UTF-32 mode, all 4- to 8-digit \x{...} values are accepted.
This makes it possible to construct invalid UTF-32 sequences for
testing purposes.
The escapes that specify line ending sequences are literal
strings, exactly as shown. No more than one newline setting
should be present in any data line.
A backslash followed by anything else just escapes the anything
else. If the very last character is a backslash, it is ignored.
This gives a way of passing an empty line as data, since a real
empty line terminates the data input.
The \J
escape provides a way of setting the maximum stack size
that is used by the just-in-time optimization code. It is ignored
if JIT optimization is not being used. Providing a stack that is
larger than the default 32K is necessary only for very
complicated patterns.
If \M is present, pcretest
calls pcre[16|32]_exec()
several
times, with different values in the match_limit and
match_limit_recursion fields of the pcre[16|32]_extra
data
structure, until it finds the minimum numbers for each parameter
that allow pcre[16|32]_exec()
to complete without error. Because
this is testing a specific feature of the normal interpretive
pcre[16|32]_exec()
execution, the use of any JIT optimization
that might have been set up by the /S+
qualifier of -s+
option is
disabled.
The match_limit number is a measure of the amount of backtracking
that takes place, and checking it out can be instructive. For
most simple matches, the number is quite small, but for patterns
with very large numbers of matching possibilities, it can become
large very quickly with increasing length of subject string. The
match_limit_recursion number is a measure of how much stack (or,
if PCRE is compiled with NO_RECURSE, how much heap) memory is
needed to complete the match attempt.
When \O is used, the value specified may be higher or lower than
the size set by the -O
command line option (or defaulted to 45);
\O applies only to the call of pcre[16|32]_exec()
for the line in
which it appears.
If the /P
modifier was present on the pattern, causing the POSIX
wrapper API to be used, the only option-setting sequences that
have any effect are \B, \N, and \Z, causing REG_NOTBOL,
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to
regexec()
.