Perl-совместимые регулярные выражения (Perl-compatible regular expressions)
NEWLINES
PCRE supports five different conventions for indicating line
breaks in strings: a single CR (carriage return) character, a
single LF (linefeed) character, the two-character sequence CRLF,
any of the three preceding, or any Unicode newline sequence. The
Unicode newline sequences are the three just mentioned, plus the
single characters VT (vertical tab, U+000B), FF (form feed,
U+000C), NEL (next line, U+0085), LS (line separator, U+2028),
and PS (paragraph separator, U+2029).
Each of the first three conventions is used by at least one
operating system as its standard newline sequence. When PCRE is
built, a default can be specified. The default default is LF,
which is the Unix standard. When PCRE is run, the default can be
overridden, either when a pattern is compiled, or when it is
matched.
At compile time, the newline convention can be specified by the
options argument of pcre_compile()
, or it can be specified by
special text at the start of the pattern itself; this overrides
any other settings. See the pcrepattern
page for details of the
special character sequences.
In the PCRE documentation the word "newline" is used to mean "the
character or pair of characters that indicate a line break". The
choice of newline convention affects the handling of the dot,
circumflex, and dollar metacharacters, the handling of #-comments
in /x mode, and, when CRLF is a recognized line ending sequence,
the match position advancement for a non-anchored pattern. There
is more detail about this in the section on pcre_exec()
options
below.
The choice of newline convention does not affect the
interpretation of the \n or \r escape sequences, nor does it
affect what \R matches, which is controlled in a similar way, but
by separate options.