Путеводитель по Руководству Linux

  User  |  Syst  |  Libr  |  Device  |  Files  |  Other  |  Admin  |  Head  |



   pcrepattern    ( 3 )

Perl-совместимые регулярные выражения (Perl-compatible regular expressions)

  Name  |  Pcre regular expression details  |  Special start-of-pattern items  |  Ebcdic character codes  |  Characters and metacharacters  |  Backslash  |  Circumflex and dollar  |  Full stop (period, dot) and \n  |  Matching a single data unit  |  Square brackets and character classes  |  Posix character classes  |  Compatibility feature for word boundaries  |  Vertical bar  |  Internal option setting  |  Subpatterns  |    Duplicate subpattern numbers    |  Named subpatterns  |  Repetition  |  Atomic grouping and possessive quantifiers  |  Back references  |  Assertions  |  Conditional subpatterns  |  Comments  |  Recursive patterns  |  Subpatterns as subroutines  |  Oniguruma subroutine syntax  |  Callouts  |  Backtracking control  |  See also  |

DUPLICATE SUBPATTERN NUMBERS

Perl 5.10 introduced a feature whereby each alternative in a subpattern uses the same numbers for its capturing parentheses. Such a subpattern starts with (?| and is itself a non-capturing subpattern. For example, consider this pattern:

(?|(Sat)ur|(Sun))day

Because the two alternatives are inside a (?| group, both sets of capturing parentheses are numbered one. Thus, when the pattern matches, you can look at captured substring number one, whichever alternative matched. This construct is useful when you want to capture part, but not all, of one of a number of alternatives. Inside a (?| group, parentheses are numbered as usual, but the number is reset at the start of each branch. The numbers of any capturing parentheses that follow the subpattern start after the highest number used in any branch. The following example is taken from the Perl documentation. The numbers underneath show in which buffer the captured content will be stored.

# before ---------------branch-reset----------- after / ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x # 1 2 2 3 2 3 4

A back reference to a numbered subpattern uses the most recent value that is set for that number by any subpattern. The following pattern matches "abcabc" or "defdef":

/(?|(abc)|(def))\1/

In contrast, a subroutine call to a numbered subpattern always refers to the first one in the pattern with the given number. The following pattern matches "abcabc" or "defabc":

/(?|(abc)|(def))(?1)/

If a condition test for a subpattern's having matched refers to a non-unique number, the test is true if any of the subpatterns of that number have matched.

An alternative approach to using this "branch reset" feature is to use duplicate named subpatterns, as described in the next section.