Путеводитель по Руководству Linux

  User  |  Syst  |  Libr  |  Device  |  Files  |  Other  |  Admin  |  Head  |



   pcreapi    ( 3 )

Perl-совместимые регулярные выражения (Perl-compatible regular expressions)

  Name  |  Pcre native api basic functions  |  Pcre native api string extraction functions  |  Pcre native api auxiliary functions  |  Pcre native api indirected functions  |  Pcre 8-bit, 16-bit, and 32-bit libraries  |  Pcre api overview  |  Newlines  |  Multithreading  |  Saving precompiled patterns for later use  |  Checking build-time options  |  Compiling a pattern  |  Compilation error codes  |  Studying a pattern  |  Locale support  |  Information about a pattern  |  Reference counts  |  Matching a pattern: the traditional function  |    Extracting captured substrings by number    |  Extracting captured substrings by name  |  Duplicate subpattern names  |  Finding all possible matches  |  Obtaining an estimate of stack usage  |  Matching a pattern: the alternative function  |  See also  |

EXTRACTING CAPTURED SUBSTRINGS BY NUMBER

int pcre_copy_substring(const char *subject, int *ovector, int stringcount, int stringnumber, char *buffer, int buffersize);

int pcre_get_substring(const char *subject, int *ovector, int stringcount, int stringnumber, const char **stringptr);

int pcre_get_substring_list(const char *subject, int *ovector, int stringcount, const char ***listptr);

Captured substrings can be accessed directly by using the offsets returned by pcre_exec() in ovector. For convenience, the functions pcre_copy_substring(), pcre_get_substring(), and pcre_get_substring_list() are provided for extracting captured substrings as new, separate, zero-terminated strings. These functions identify substrings by number. The next section describes functions for extracting named substrings.

A substring that contains a binary zero is correctly extracted and has a further zero added on the end, but the result is not, of course, a C string. However, you can process such a string by referring to the length that is returned by pcre_copy_substring() and pcre_get_substring(). Unfortunately, the interface to pcre_get_substring_list() is not adequate for handling strings containing binary zeros, because the end of the final string is not independently indicated.

The first three arguments are the same for all three of these functions: subject is the subject string that has just been successfully matched, ovector is a pointer to the vector of integer offsets that was passed to pcre_exec(), and stringcount is the number of substrings that were captured by the match, including the substring that matched the entire regular expression. This is the value returned by pcre_exec() if it is greater than zero. If pcre_exec() returned zero, indicating that it ran out of space in ovector, the value passed as stringcount should be the number of elements in the vector divided by three.

The functions pcre_copy_substring() and pcre_get_substring() extract a single substring, whose number is given as stringnumber. A value of zero extracts the substring that matched the entire pattern, whereas higher values extract the captured substrings. For pcre_copy_substring(), the string is placed in buffer, whose length is given by buffersize, while for pcre_get_substring() a new block of memory is obtained via pcre_malloc, and its address is returned via stringptr. The yield of the function is the length of the string, not including the terminating zero, or one of these error codes:

PCRE_ERROR_NOMEMORY (-6)

The buffer was too small for pcre_copy_substring(), or the attempt to get memory failed for pcre_get_substring().

PCRE_ERROR_NOSUBSTRING (-7)

There is no substring whose number is stringnumber.

The pcre_get_substring_list() function extracts all available substrings and builds a list of pointers to them. All this is done in a single block of memory that is obtained via pcre_malloc. The address of the memory block is returned via listptr, which is also the start of the list of string pointers. The end of the list is marked by a NULL pointer. The yield of the function is zero if all went well, or the error code

PCRE_ERROR_NOMEMORY (-6)

if the attempt to get the memory block failed.

When any of these functions encounter a substring that is unset, which can happen when capturing subpattern number n+1 matches some part of the subject, but subpattern n has not been used at all, they return an empty string. This can be distinguished from a genuine zero-length substring by inspecting the appropriate offset in ovector, which is negative for unset substrings.

The two convenience functions pcre_free_substring() and pcre_free_substring_list() can be used to free the memory returned by a previous call of pcre_get_substring() or pcre_get_substring_list(), respectively. They do nothing more than call the function pointed to by pcre_free, which of course could be called directly from a C program. However, PCRE is used in some situations where it is linked via a special interface to another programming language that cannot use pcre_free directly; it is for these cases that the functions are provided.