`pcretest` ( 1 )

программа для тестирования регулярных выражений, совместимых с Perl (a program for testing Perl-compatible regular expressions.)

DATA LINES

Формат

Before each data line is passed to pcre[16|32]_exec(), leading and trailing white space is removed, and it is then scanned for \ escapes. Some of these are pretty esoteric features, intended for checking out some of the more complicated features of PCRE. If you are just testing "ordinary" regular expressions, you probably don't need any of these. The following escapes are recognized:

\a alarm (BEL, \x07) \b backspace (\x08) \e escape (\x27) \f form feed (\x0c) \n newline (\x0a) \qdd set the PCRE_MATCH_LIMIT limit to dd (any number of digits) \r carriage return (\x0d) \t tab (\x09) \v vertical tab (\x0b) \nnn octal character (up to 3 octal digits); always a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode \o{dd...} octal character (any number of octal digits} \xhh hexadecimal byte (up to 2 hex digits) \x{hh...} hexadecimal character (any number of hex digits) \A pass the PCRE_ANCHORED option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec() \B pass the PCRE_NOTBOL option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec() \Cdd call pcre[16|32]_copy_substring() for substring dd after a successful match (number less than 32) \Cname call pcre[16|32]_copy_named_substring() for substring "name" after a successful match (name termin- ated by next non alphanumeric character) \C+ show the current captured substrings at callout time \C- do not supply a callout function \C!n return 1 instead of 0 when callout number n is reached \C!n!m return 1 instead of 0 when callout number n is reached for the nth time \C*n pass the number n (may be negative) as callout data; this is used as the callout return value \D use the pcre[16|32]_dfa_exec() match function \F only shortest match for pcre[16|32]_dfa_exec() \Gdd call pcre[16|32]_get_substring() for substring dd after a successful match (number less than 32) \Gname call pcre[16|32]_get_named_substring() for substring "name" after a successful match (name termin- ated by next non-alphanumeric character) \Jdd set up a JIT stack of dd kilobytes maximum (any number of digits) \L call pcre[16|32]_get_substringlist() after a successful match \M discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings \N pass the PCRE_NOTEMPTY option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec(); if used twice, pass the PCRE_NOTEMPTY_ATSTART option \Odd set the size of the output vector passed to pcre[16|32]_exec() to dd (any number of digits) \P pass the PCRE_PARTIAL_SOFT option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec(); if used twice, pass the PCRE_PARTIAL_HARD option \Qdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits) \R pass the PCRE_DFA_RESTART option to pcre[16|32]_dfa_exec() \S output details of memory get/free calls during matching \Y pass the PCRE_NO_START_OPTIMIZE option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec() \Z pass the PCRE_NOTEOL option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec() \? pass the PCRE_NO_UTF[8|16|32]_CHECK option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec() \>dd start the match at offset dd (optional "-"; then any number of digits); this sets the startoffset argument for pcre[16|32]_exec() or pcre[16|32]_dfa_exec() \<cr> pass the PCRE_NEWLINE_CR option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec() \<lf> pass the PCRE_NEWLINE_LF option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec() \<crlf> pass the PCRE_NEWLINE_CRLF option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec() \<anycrlf> pass the PCRE_NEWLINE_ANYCRLF option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec() \<any> pass the PCRE_NEWLINE_ANY option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec()

The use of \x{hh...} is not dependent on the use of the /8 modifier on the pattern. It is recognized always. There may be any number of hexadecimal digits inside the braces; invalid values provoke error messages.

Note that \xhh specifies one byte rather than one character in UTF-8 mode; this makes it possible to construct invalid UTF-8 sequences for testing purposes. On the other hand, \x{hh} is interpreted as a UTF-8 character in UTF-8 mode, generating more than one byte if the value is greater than 127. When testing the 8-bit library not in UTF-8 mode, \x{hh} generates one byte for values less than 256, and causes an error for greater values.

In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it possible to construct invalid UTF-16 sequences for testing purposes.

In UTF-32 mode, all 4- to 8-digit \x{...} values are accepted. This makes it possible to construct invalid UTF-32 sequences for testing purposes.

The escapes that specify line ending sequences are literal strings, exactly as shown. No more than one newline setting should be present in any data line.

A backslash followed by anything else just escapes the anything else. If the very last character is a backslash, it is ignored. This gives a way of passing an empty line as data, since a real empty line terminates the data input.

The \J escape provides a way of setting the maximum stack size that is used by the just-in-time optimization code. It is ignored if JIT optimization is not being used. Providing a stack that is larger than the default 32K is necessary only for very complicated patterns.

If \M is present, pcretest calls pcre[16|32]_exec() several times, with different values in the match_limit and match_limit_recursion fields of the pcre[16|32]_extra data structure, until it finds the minimum numbers for each parameter that allow pcre[16|32]_exec() to complete without error. Because this is testing a specific feature of the normal interpretive pcre[16|32]_exec() execution, the use of any JIT optimization that might have been set up by the /S+ qualifier of -s+ option is disabled.

The match_limit number is a measure of the amount of backtracking that takes place, and checking it out can be instructive. For most simple matches, the number is quite small, but for patterns with very large numbers of matching possibilities, it can become large very quickly with increasing length of subject string. The match_limit_recursion number is a measure of how much stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is needed to complete the match attempt.

When \O is used, the value specified may be higher or lower than the size set by the -O command line option (or defaulted to 45); \O applies only to the call of pcre[16|32]_exec() for the line in which it appears.

If the /P modifier was present on the pattern, causing the POSIX wrapper API to be used, the only option-setting sequences that have any effect are \B, \N, and \Z, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to regexec().

Исходный текст на man7.org

pcretest ( 1 )

DATA LINES

`pcretest` ( 1 )