Путеводитель по Руководству Linux

  User  |  Syst  |  Libr  |  Device  |  Files  |  Other  |  Admin  |  Head  |



   pcretest    ( 1 )

программа для тестирования регулярных выражений, совместимых с Perl (a program for testing Perl-compatible regular expressions.)

DEFAULT OUTPUT FROM PCRETEST

This section describes the output when the normal matching
       function, pcre[16|32]_exec(), is being used.

When a match succeeds, pcretest outputs the list of captured substrings that pcre[16|32]_exec() returns, starting with number 0 for the string that matched the whole pattern. Otherwise, it outputs "No match" when the return is PCRE_ERROR_NOMATCH, and "Partial match:" followed by the partially matching substring when pcre[16|32]_exec() returns PCRE_ERROR_PARTIAL. (Note that this is the entire substring that was inspected during the partial match; it may include characters before the actual match start if a lookbehind assertion, \K, \b, or \B was involved.) For any other return, pcretest outputs the PCRE negative error number and a short descriptive phrase. If the error is a failed UTF string check, the offset of the start of the failing character and the reason code are also output, provided that the size of the output vector is at least two. Here is an example of an interactive pcretest run.

$ pcretest PCRE version 8.13 2011-04-30

re> /^abc(\d+)/ data> abc123 0: abc123 1: 123 data> xyz No match

Unset capturing substrings that are not followed by one that is set are not returned by pcre[16|32]_exec(), and are not shown by pcretest. In the following example, there are two capturing substrings, but when the first data line is matched, the second, unset substring is not shown. An "internal" unset substring is shown as "<unset>", as for the second data line.

re> /(a)|(b)/ data> a 0: a 1: a data> b 0: b 1: <unset> 2: b

If the strings contain any non-printing characters, they are output as \xhh escapes if the value is less than 256 and UTF mode is not set. Otherwise they are output as \x{hh...} escapes. See below for the definition of non-printing characters. If the pattern has the /+ modifier, the output for substring 0 is followed by the the rest of the subject string, identified by "0+" like this:

re> /cat/+ data> cataract 0: cat 0+ aract

If the pattern has the /g or /G modifier, the results of successive matching attempts are output in sequence, like this:

re> /\Bi(\w\w)/g data> Mississippi 0: iss 1: ss 0: iss 1: ss 0: ipp 1: pp

"No match" is output only if the first match attempt fails. Here is an example of a failure message (the offset 4 that is specified by \>4 is past the end of the subject string):

re> /xyz/ data> xyz\>4 Error -24 (bad offset value)

If any of the sequences \C, \G, or \L are present in a data line that is successfully matched, the substrings extracted by the convenience functions are output with C, G, or L after the string number instead of a colon. This is in addition to the normal full list. The string length (that is, the return from the extraction function) is given in parentheses after each string for \C and \G.

Note that whereas patterns can be continued over several lines (a plain ">" prompt is used for continuations), data lines may not. However newlines can be included in data by means of the \n escape (or \r, \r\n, etc., depending on the newline sequence setting).