Perl-совместимые регулярные выражения (Perl-compatible regular expressions)
EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
int pcre_copy_substring(const char *
subject, int *
ovector,
int
stringcount, int
stringnumber, char *
buffer,
int
buffersize);
int pcre_get_substring(const char *
subject, int *
ovector,
int
stringcount, int
stringnumber,
const char **
stringptr);
int pcre_get_substring_list(const char *
subject,
int *
ovector, int
stringcount, const char ***
listptr);
Captured substrings can be accessed directly by using the offsets
returned by pcre_exec()
in ovector. For convenience, the
functions pcre_copy_substring()
, pcre_get_substring()
, and
pcre_get_substring_list()
are provided for extracting captured
substrings as new, separate, zero-terminated strings. These
functions identify substrings by number. The next section
describes functions for extracting named substrings.
A substring that contains a binary zero is correctly extracted
and has a further zero added on the end, but the result is not,
of course, a C string. However, you can process such a string by
referring to the length that is returned by pcre_copy_substring()
and pcre_get_substring()
. Unfortunately, the interface to
pcre_get_substring_list()
is not adequate for handling strings
containing binary zeros, because the end of the final string is
not independently indicated.
The first three arguments are the same for all three of these
functions: subject is the subject string that has just been
successfully matched, ovector is a pointer to the vector of
integer offsets that was passed to pcre_exec()
, and stringcount
is the number of substrings that were captured by the match,
including the substring that matched the entire regular
expression. This is the value returned by pcre_exec()
if it is
greater than zero. If pcre_exec()
returned zero, indicating that
it ran out of space in ovector, the value passed as stringcount
should be the number of elements in the vector divided by three.
The functions pcre_copy_substring()
and pcre_get_substring()
extract a single substring, whose number is given as
stringnumber. A value of zero extracts the substring that matched
the entire pattern, whereas higher values extract the captured
substrings. For pcre_copy_substring()
, the string is placed in
buffer, whose length is given by buffersize, while for
pcre_get_substring()
a new block of memory is obtained via
pcre_malloc
, and its address is returned via stringptr. The yield
of the function is the length of the string, not including the
terminating zero, or one of these error codes:
PCRE_ERROR_NOMEMORY (-6)
The buffer was too small for pcre_copy_substring()
, or the
attempt to get memory failed for pcre_get_substring()
.
PCRE_ERROR_NOSUBSTRING (-7)
There is no substring whose number is stringnumber.
The pcre_get_substring_list()
function extracts all available
substrings and builds a list of pointers to them. All this is
done in a single block of memory that is obtained via
pcre_malloc
. The address of the memory block is returned via
listptr, which is also the start of the list of string pointers.
The end of the list is marked by a NULL pointer. The yield of the
function is zero if all went well, or the error code
PCRE_ERROR_NOMEMORY (-6)
if the attempt to get the memory block failed.
When any of these functions encounter a substring that is unset,
which can happen when capturing subpattern number n+1 matches
some part of the subject, but subpattern n has not been used at
all, they return an empty string. This can be distinguished from
a genuine zero-length substring by inspecting the appropriate
offset in ovector, which is negative for unset substrings.
The two convenience functions pcre_free_substring()
and
pcre_free_substring_list()
can be used to free the memory
returned by a previous call of pcre_get_substring()
or
pcre_get_substring_list()
, respectively. They do nothing more
than call the function pointed to by pcre_free
, which of course
could be called directly from a C program. However, PCRE is used
in some situations where it is linked via a special interface to
another programming language that cannot use pcre_free
directly;
it is for these cases that the functions are provided.