Путеводитель по Руководству Linux

  User  |  Syst  |  Libr  |  Device  |  Files  |  Other  |  Admin  |  Head  |



   regexec.3p    ( 3 )

соответствие регулярному выражению (regular expression matching)

Имя (Name)

regcomp, regerror, regexec, regfree — regular expression matching


Синопсис (Synopsis)

#include <regex.h>

int regcomp(regex_t *restrict preg, const char *restrict pattern, int cflags); size_t regerror(int errcode, const regex_t *restrict preg, char *restrict errbuf, size_t errbuf_size); int regexec(const regex_t *restrict preg, const char *restrict string, size_t nmatch, regmatch_t pmatch[restrict], int eflags); void regfree(regex_t *preg);


Описание (Description)

These functions interpret basic and extended regular expressions as described in the Base Definitions volume of POSIX.1‐2017, Chapter 9, Regular Expressions.

The regex_t structure is defined in <regex.h> and contains at least the following member:

┌──────────────┬──────────────┬───────────────────────────┐ │Member Type Member Name Description │ ├──────────────┼──────────────┼───────────────────────────┤ │size_t re_nsub │ Number of parenthesized │ │ │ │ subexpressions. │ └──────────────┴──────────────┴───────────────────────────┘ The regmatch_t structure is defined in <regex.h> and contains at least the following members:

┌──────────────┬──────────────┬───────────────────────────┐ │Member Type Member Name Description │ ├──────────────┼──────────────┼───────────────────────────┤ │regoff_t rm_so │ Byte offset from start of │ │ │ │ string to start of │ │ │ │ substring. │ │regoff_t rm_eo │ Byte offset from start of │ │ │ │ string of the first │ │ │ │ character after the end │ │ │ │ of substring. │ └──────────────┴──────────────┴───────────────────────────┘ The regcomp() function shall compile the regular expression contained in the string pointed to by the pattern argument and place the results in the structure pointed to by preg. The cflags argument is the bitwise-inclusive OR of zero or more of the following flags, which are defined in the <regex.h> header:

REG_EXTENDED Use Extended Regular Expressions.

REG_ICASE Ignore case in match (see the Base Definitions volume of POSIX.1‐2017, Chapter 9, Regular Expressions).

REG_NOSUB Report only success/fail in regexec().

REG_NEWLINE Change the handling of <newline> characters, as described in the text.

The default regular expression type for pattern is a Basic Regular Expression. The application can specify Extended Regular Expressions using the REG_EXTENDED cflags flag.

If the REG_NOSUB flag was not set in cflags, then regcomp() shall set re_nsub to the number of parenthesized subexpressions (delimited by "\(\)" in basic regular expressions or "()" in extended regular expressions) found in pattern.

The regexec() function compares the null-terminated string specified by string with the compiled regular expression preg initialized by a previous call to regcomp(). If it finds a match, regexec() shall return 0; otherwise, it shall return non- zero indicating either no match or an error. The eflags argument is the bitwise-inclusive OR of zero or more of the following flags, which are defined in the <regex.h> header:

REG_NOTBOL The first character of the string pointed to by string is not the beginning of the line. Therefore, the <circumflex> character ('^'), when taken as a special character, shall not match the beginning of string.

REG_NOTEOL The last character of the string pointed to by string is not the end of the line. Therefore, the <dollar-sign> ('$'), when taken as a special character, shall not match the end of string.

If nmatch is 0 or REG_NOSUB was set in the cflags argument to regcomp(), then regexec() shall ignore the pmatch argument. Otherwise, the application shall ensure that the pmatch argument points to an array with at least nmatch elements, and regexec() shall fill in the elements of that array with offsets of the substrings of string that correspond to the parenthesized subexpressions of pattern: pmatch[i].rm_so shall be the byte offset of the beginning and pmatch[i].rm_eo shall be one greater than the byte offset of the end of substring i. (Subexpression i begins at the ith matched open parenthesis, counting from 1.) Offsets in pmatch[0] identify the substring that corresponds to the entire regular expression. Unused elements of pmatch up to pmatch[nmatch-1] shall be filled with -1. If there are more than nmatch subexpressions in pattern (pattern itself counts as a subexpression), then regexec() shall still do the match, but shall record only the first nmatch substrings.

When matching a basic or extended regular expression, any given parenthesized subexpression of pattern might participate in the match of several different substrings of string, or it might not match any substring even though the pattern as a whole did match. The following rules shall be used to determine which substrings to report in pmatch when matching regular expressions:

1. If subexpression i in a regular expression is not contained within another subexpression, and it participated in the match several times, then the byte offsets in pmatch[i] shall delimit the last such match.

2. If subexpression i is not contained within another subexpression, and it did not participate in an otherwise successful match, the byte offsets in pmatch[i] shall be -1. A subexpression does not participate in the match when:

'*' or "\{\}" appears immediately after the subexpression in a basic regular expression, or '*', '?', or "{}" appears immediately after the subexpression in an extended regular expression, and the subexpression did not match (matched 0 times)

or:

'|' is used in an extended regular expression to select this subexpression or another, and the other subexpression matched.

3. If subexpression i is contained within another subexpression j, and i is not contained within any other subexpression that is contained within j, and a match of subexpression j is reported in pmatch[j], then the match or non-match of subexpression i reported in pmatch[i] shall be as described in 1. and 2. above, but within the substring reported in pmatch[j] rather than the whole string. The offsets in pmatch[i] are still relative to the start of string.

4. If subexpression i is contained in subexpression j, and the byte offsets in pmatch[j] are -1, then the pointers in pmatch[i] shall also be -1.

5. If subexpression i matched a zero-length string, then both byte offsets in pmatch[i] shall be the byte offset of the character or null terminator immediately following the zero- length string.

If, when regexec() is called, the locale is different from when the regular expression was compiled, the result is undefined.

If REG_NEWLINE is not set in cflags, then a <newline> in pattern or string shall be treated as an ordinary character. If REG_NEWLINE is set, then <newline> shall be treated as an ordinary character except as follows:

1. A <newline> in string shall not be matched by a <period> outside a bracket expression or by any form of a non-matching list (see the Base Definitions volume of POSIX.1‐2017, Chapter 9, Regular Expressions).

2. A <circumflex> ('^') in pattern, when used to specify expression anchoring (see the Base Definitions volume of POSIX.1‐2017, Section 9.3.8, BRE Expression Anchoring), shall match the zero-length string immediately after a <newline> in string, regardless of the setting of REG_NOTBOL.

3. A <dollar-sign> ('$') in pattern, when used to specify expression anchoring, shall match the zero-length string immediately before a <newline> in string, regardless of the setting of REG_NOTEOL.

The regfree() function frees any memory allocated by regcomp() associated with preg.

The following constants are defined as the minimum set of error return values, although other errors listed as implementation extensions in <regex.h> are possible:

REG_BADBR Content of "\{\}" invalid: not a number, number too large, more than two numbers, first larger than second.

REG_BADPAT Invalid regular expression.

REG_BADRPT '?', '*', or '+' not preceded by valid regular expression.

REG_EBRACE "\{\}" imbalance.

REG_EBRACK "[]" imbalance.

REG_ECOLLATE Invalid collating element referenced.

REG_ECTYPE Invalid character class type referenced.

REG_EESCAPE Trailing <backslash> character in pattern.

REG_EPAREN "\(\)" or "()" imbalance.

REG_ERANGE Invalid endpoint in range expression.

REG_ESPACE Out of memory.

REG_ESUBREG Number in "\digit" invalid or in error.

REG_NOMATCH regexec() failed to match.

If more than one error occurs in processing a function call, any one of the possible constants may be returned, as the order of detection is unspecified.

The regerror() function provides a mapping from error codes returned by regcomp() and regexec() to unspecified printable strings. It generates a string corresponding to the value of the errcode argument, which the application shall ensure is the last non-zero value returned by regcomp() or regexec() with the given value of preg. If errcode is not such a value, the content of the generated string is unspecified.

If preg is a null pointer, but errcode is a value returned by a previous call to regexec() or regcomp(), the regerror() still generates an error string corresponding to the value of errcode, but it might not be as detailed under some implementations.

If the errbuf_size argument is not 0, regerror() shall place the generated string into the buffer of size errbuf_size bytes pointed to by errbuf. If the string (including the terminating null) cannot fit in the buffer, regerror() shall truncate the string and null-terminate the result.

If errbuf_size is 0, regerror() shall ignore the errbuf argument, and return the size of the buffer needed to hold the generated string.

If the preg argument to regexec() or regfree() is not a compiled regular expression returned by regcomp(), the result is undefined. A preg is no longer treated as a compiled regular expression after it is given to regfree().