`pcrepattern` ( 3 )

Perl-совместимые регулярные выражения (Perl-compatible regular expressions)

MATCHING A SINGLE DATA UNIT

Outside a character class, the escape sequence \C matches any one
       data unit, whether or not a UTF mode is set. In the 8-bit
       library, one data unit is one byte; in the 16-bit library it is a
       16-bit unit; in the 32-bit library it is a 32-bit unit. Unlike a
       dot, \C always matches line-ending characters. The feature is
       provided in Perl in order to match individual bytes in UTF-8
       mode, but it is unclear how it can usefully be used. Because \C
       breaks up characters into individual data units, matching one
       unit with \C in a UTF mode means that the rest of the string may
       start with a malformed UTF character. This has undefined results,
       because PCRE assumes that it is dealing with valid UTF strings
       (and by default it checks this at the start of processing unless
       the PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK or
       PCRE_NO_UTF32_CHECK option is used).

       PCRE does not allow \C to appear in lookbehind assertions
       (described below) in a UTF mode, because this would make it
       impossible to calculate the length of the lookbehind.

       In general, the \C escape sequence is best avoided. However, one
       way of using it that avoids the problem of malformed UTF
       characters is to use a lookahead to check the length of the next
       character, as in this pattern, which could be used with a UTF-8
       string (ignore white space and line breaks):

         (?| (?=[\x00-\x7f])(\C) |
             (?=[\x80-\x{7ff}])(\C)(\C) |
             (?=[\x{800}-\x{ffff}])(\C)(\C)(\C) |
             (?=[\x{10000}-\x{1fffff}])(\C)(\C)(\C)(\C))

       A group that starts with (?| resets the capturing parentheses
       numbers in each alternative (see "Duplicate Subpattern Numbers"
       below). The assertions at the start of each branch check the next
       UTF-8 character for values whose encoding uses 1, 2, 3, or 4
       bytes, respectively. The character's individual bytes are then
       captured by the appropriate number of groups.

Исходный текст на man7.org

pcrepattern ( 3 )

MATCHING A SINGLE DATA UNIT

`pcrepattern` ( 3 )