Путеводитель по Руководству Linux

  User  |  Syst  |  Libr  |  Device  |  Files  |  Other  |  Admin  |  Head  |



   pcrepattern    ( 3 )

Perl-совместимые регулярные выражения (Perl-compatible regular expressions)

  Name  |  Pcre regular expression details  |  Special start-of-pattern items  |  Ebcdic character codes  |    Characters and metacharacters    |  Backslash  |  Circumflex and dollar  |  Full stop (period, dot) and \n  |  Matching a single data unit  |  Square brackets and character classes  |  Posix character classes  |  Compatibility feature for word boundaries  |  Vertical bar  |  Internal option setting  |  Subpatterns  |  Duplicate subpattern numbers  |  Named subpatterns  |  Repetition  |  Atomic grouping and possessive quantifiers  |  Back references  |  Assertions  |  Conditional subpatterns  |  Comments  |  Recursive patterns  |  Subpatterns as subroutines  |  Oniguruma subroutine syntax  |  Callouts  |  Backtracking control  |  See also  |

CHARACTERS AND METACHARACTERS

A regular expression is a pattern that is matched against a subject string from left to right. Most characters stand for themselves in a pattern, and match the corresponding characters in the subject. As a trivial example, the pattern

The quick brown fox

matches a portion of a subject string that is identical to itself. When caseless matching is specified (the PCRE_CASELESS option), letters are matched independently of case. In a UTF mode, PCRE always understands the concept of case for characters whose values are less than 128, so caseless matching is always possible. For characters with higher values, the concept of case is supported if PCRE is compiled with Unicode property support, but not otherwise. If you want to use caseless matching for characters 128 and above, you must ensure that PCRE is compiled with Unicode property support as well as with UTF support.

The power of regular expressions comes from the ability to include alternatives and repetitions in the pattern. These are encoded in the pattern by the use of metacharacters, which do not stand for themselves but instead are interpreted in some special way.

There are two different sets of metacharacters: those that are recognized anywhere in the pattern except within square brackets, and those that are recognized within square brackets. Outside square brackets, the metacharacters are as follows:

\ general escape character with several uses ^ assert start of string (or line, in multiline mode) $ assert end of string (or line, in multiline mode) . match any character except newline (by default) [ start character class definition | start of alternative branch ( start subpattern ) end subpattern ? extends the meaning of ( also 0 or 1 quantifier also quantifier minimizer * 0 or more quantifier + 1 or more quantifier also "possessive quantifier" { start min/max quantifier

Part of a pattern that is in square brackets is called a "character class". In a character class the only metacharacters are:

\ general escape character ^ negate the class, but only if the first character - indicates character range [ POSIX character class (only if followed by POSIX syntax) ] terminates the character class

The following sections describe the use of each of the metacharacters.