Путеводитель по Руководству Linux

  User  |  Syst  |  Libr  |  Device  |  Files  |  Other  |  Admin  |  Head  |



   pcreapi    ( 3 )

Perl-совместимые регулярные выражения (Perl-compatible regular expressions)

  Name  |  Pcre native api basic functions  |  Pcre native api string extraction functions  |  Pcre native api auxiliary functions  |  Pcre native api indirected functions  |  Pcre 8-bit, 16-bit, and 32-bit libraries  |  Pcre api overview  |    Newlines    |  Multithreading  |  Saving precompiled patterns for later use  |  Checking build-time options  |  Compiling a pattern  |  Compilation error codes  |  Studying a pattern  |  Locale support  |  Information about a pattern  |  Reference counts  |  Matching a pattern: the traditional function  |  Extracting captured substrings by number  |  Extracting captured substrings by name  |  Duplicate subpattern names  |  Finding all possible matches  |  Obtaining an estimate of stack usage  |  Matching a pattern: the alternative function  |  See also  |

NEWLINES

PCRE supports five different conventions for indicating line breaks in strings: a single CR (carriage return) character, a single LF (linefeed) character, the two-character sequence CRLF, any of the three preceding, or any Unicode newline sequence. The Unicode newline sequences are the three just mentioned, plus the single characters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+2029).

Each of the first three conventions is used by at least one operating system as its standard newline sequence. When PCRE is built, a default can be specified. The default default is LF, which is the Unix standard. When PCRE is run, the default can be overridden, either when a pattern is compiled, or when it is matched.

At compile time, the newline convention can be specified by the options argument of pcre_compile(), or it can be specified by special text at the start of the pattern itself; this overrides any other settings. See the pcrepattern page for details of the special character sequences.

In the PCRE documentation the word "newline" is used to mean "the character or pair of characters that indicate a line break". The choice of newline convention affects the handling of the dot, circumflex, and dollar metacharacters, the handling of #-comments in /x mode, and, when CRLF is a recognized line ending sequence, the match position advancement for a non-anchored pattern. There is more detail about this in the section on pcre_exec() options below.

The choice of newline convention does not affect the interpretation of the \n or \r escape sequences, nor does it affect what \R matches, which is controlled in a similar way, but by separate options.