int pcre_fullinfo(const pcre *
code, const pcre_extra *
extra,
int
what, void *
where);
The pcre_fullinfo()
function returns information about a compiled
pattern. It replaces the pcre_info()
function, which was removed
from the library at version 8.30, after more than 10 years of
obsolescence.
The first argument for pcre_fullinfo()
is a pointer to the
compiled pattern. The second argument is the result of
pcre_study()
, or NULL if the pattern was not studied. The third
argument specifies which piece of information is required, and
the fourth argument is a pointer to a variable to receive the
data. The yield of the function is zero for success, or one of
the following negative numbers:
PCRE_ERROR_NULL the argument code was NULL
the argument where was NULL
PCRE_ERROR_BADMAGIC the "magic number" was not found
PCRE_ERROR_BADENDIANNESS the pattern was compiled with
different
endianness
PCRE_ERROR_BADOPTION the value of what was invalid
PCRE_ERROR_UNSET the requested field is not set
The "magic number" is placed at the start of each compiled
pattern as a simple check against passing an arbitrary memory
pointer. The endianness error can occur if a compiled pattern is
saved and reloaded on a different host. Here is a typical call of
pcre_fullinfo()
, to obtain the length of the compiled pattern:
int rc;
size_t length;
rc = pcre_fullinfo(
re, /* result of pcre_compile() */
sd, /* result of pcre_study(), or NULL */
PCRE_INFO_SIZE, /* what is required */
&length); /* where to put the data */
The possible values for the third argument are defined in pcre.h
,
and are as follows:
PCRE_INFO_BACKREFMAX
Return the number of the highest back reference in the pattern.
The fourth argument should point to an int
variable. Zero is
returned if there are no back references.
PCRE_INFO_CAPTURECOUNT
Return the number of capturing subpatterns in the pattern. The
fourth argument should point to an int
variable.
PCRE_INFO_DEFAULT_TABLES
Return a pointer to the internal default character tables within
PCRE. The fourth argument should point to an unsigned char *
variable. This information call is provided for internal use by
the pcre_study()
function. External callers can cause PCRE to use
its internal tables by passing a NULL table pointer.
PCRE_INFO_FIRSTBYTE (deprecated)
Return information about the first data unit of any matched
string, for a non-anchored pattern. The name of this option
refers to the 8-bit library, where data units are bytes. The
fourth argument should point to an int
variable. Negative values
are used for special cases. However, this means that when the
32-bit library is in non-UTF-32 mode, the full 32-bit range of
characters cannot be returned. For this reason, this value is
deprecated; use PCRE_INFO_FIRSTCHARACTERFLAGS and
PCRE_INFO_FIRSTCHARACTER instead.
If there is a fixed first value, for example, the letter "c" from
a pattern such as (cat|cow|coyote), its value is returned. In the
8-bit library, the value is always less than 256. In the 16-bit
library the value can be up to 0xffff. In the 32-bit library the
value can be up to 0x10ffff.
If there is no fixed first value, and if either
(a) the pattern was compiled with the PCRE_MULTILINE option, and
every branch starts with "^", or
(b) every branch of the pattern starts with ".*" and PCRE_DOTALL
is not set (if it were set, the pattern would be anchored),
-1 is returned, indicating that the pattern matches only at the
start of a subject string or after any newline within the string.
Otherwise -2 is returned. For anchored patterns, -2 is returned.
PCRE_INFO_FIRSTCHARACTER
Return the value of the first data unit (non-UTF character) of
any matched string in the situation where
PCRE_INFO_FIRSTCHARACTERFLAGS returns 1; otherwise return 0. The
fourth argument should point to a uint_t
variable.
In the 8-bit library, the value is always less than 256. In the
16-bit library the value can be up to 0xffff. In the 32-bit
library in UTF-32 mode the value can be up to 0x10ffff, and up to
0xffffffff when not using UTF-32 mode.
PCRE_INFO_FIRSTCHARACTERFLAGS
Return information about the first data unit of any matched
string, for a non-anchored pattern. The fourth argument should
point to an int
variable.
If there is a fixed first value, for example, the letter "c" from
a pattern such as (cat|cow|coyote), 1 is returned, and the
character value can be retrieved using PCRE_INFO_FIRSTCHARACTER.
If there is no fixed first value, and if either
(a) the pattern was compiled with the PCRE_MULTILINE option, and
every branch starts with "^", or
(b) every branch of the pattern starts with ".*" and PCRE_DOTALL
is not set (if it were set, the pattern would be anchored),
2 is returned, indicating that the pattern matches only at the
start of a subject string or after any newline within the string.
Otherwise 0 is returned. For anchored patterns, 0 is returned.
PCRE_INFO_FIRSTTABLE
If the pattern was studied, and this resulted in the construction
of a 256-bit table indicating a fixed set of values for the first
data unit in any matching string, a pointer to the table is
returned. Otherwise NULL is returned. The fourth argument should
point to an unsigned char *
variable.
PCRE_INFO_HASCRORLF
Return 1 if the pattern contains any explicit matches for CR or
LF characters, otherwise 0. The fourth argument should point to
an int
variable. An explicit match is either a literal CR or LF
character, or \r or \n.
PCRE_INFO_JCHANGED
Return 1 if the (?J) or (?-J) option setting is used in the
pattern, otherwise 0. The fourth argument should point to an int
variable. (?J) and (?-J) set and unset the local PCRE_DUPNAMES
option, respectively.
PCRE_INFO_JIT
Return 1 if the pattern was studied with one of the JIT options,
and just-in-time compiling was successful. The fourth argument
should point to an int
variable. A return value of 0 means that
JIT support is not available in this version of PCRE, or that the
pattern was not studied with a JIT option, or that the JIT
compiler could not handle this particular pattern. See the
pcrejit
documentation for details of what can and cannot be
handled.
PCRE_INFO_JITSIZE
If the pattern was successfully studied with a JIT option, return
the size of the JIT compiled code, otherwise return zero. The
fourth argument should point to a size_t
variable.
PCRE_INFO_LASTLITERAL
Return the value of the rightmost literal data unit that must
exist in any matched string, other than at its start, if such a
value has been recorded. The fourth argument should point to an
int
variable. If there is no such value, -1 is returned. For
anchored patterns, a last literal value is recorded only if it
follows something of variable length. For example, for the
pattern /^a\d+z\d+/ the returned value is "z", but for /^a\dz\d/
the returned value is -1.
Since for the 32-bit library using the non-UTF-32 mode, this
function is unable to return the full 32-bit range of characters,
this value is deprecated; instead the PCRE_INFO_REQUIREDCHARFLAGS
and PCRE_INFO_REQUIREDCHAR values should be used.
PCRE_INFO_MATCH_EMPTY
Return 1 if the pattern can match an empty string, otherwise 0.
The fourth argument should point to an int
variable.
PCRE_INFO_MATCHLIMIT
If the pattern set a match limit by including an item of the form
(*LIMIT_MATCH=nnnn) at the start, the value is returned. The
fourth argument should point to an unsigned 32-bit integer. If no
such value has been set, the call to pcre_fullinfo()
returns the
error PCRE_ERROR_UNSET.
PCRE_INFO_MAXLOOKBEHIND
Return the number of characters (NB not data units) in the
longest lookbehind assertion in the pattern. This information is
useful when doing multi-segment matching using the partial
matching facilities. Note that the simple assertions \b and \B
require a one-character lookbehind. \A also registers a one-
character lookbehind, though it does not actually inspect the
previous character. This is to ensure that at least one character
from the old segment is retained when a new segment is processed.
Otherwise, if there are no lookbehinds in the pattern, \A might
match incorrectly at the start of a new segment.
PCRE_INFO_MINLENGTH
If the pattern was studied and a minimum length for matching
subject strings was computed, its value is returned. Otherwise
the returned value is -1. The value is a number of characters,
which in UTF mode may be different from the number of data units.
The fourth argument should point to an int
variable. A non-
negative value is a lower bound to the length of any matching
string. There may not be any strings of that length that do
actually match, but every string that does match is at least that
long.
PCRE_INFO_NAMECOUNT
PCRE_INFO_NAMEENTRYSIZE
PCRE_INFO_NAMETABLE
PCRE supports the use of named as well as numbered capturing
parentheses. The names are just an additional way of identifying
the parentheses, which still acquire numbers. Several convenience
functions such as pcre_get_named_substring()
are provided for
extracting captured substrings by name. It is also possible to
extract the data directly, by first converting the name to a
number in order to access the correct pointers in the output
vector (described with pcre_exec()
below). To do the conversion,
you need to use the name-to-number map, which is described by
these three values.
The map consists of a number of fixed-size entries.
PCRE_INFO_NAMECOUNT gives the number of entries, and
PCRE_INFO_NAMEENTRYSIZE gives the size of each entry; both of
these return an int
value. The entry size depends on the length
of the longest name. PCRE_INFO_NAMETABLE returns a pointer to the
first entry of the table. This is a pointer to char
in the 8-bit
library, where the first two bytes of each entry are the number
of the capturing parenthesis, most significant byte first. In the
16-bit library, the pointer points to 16-bit data units, the
first of which contains the parenthesis number. In the 32-bit
library, the pointer points to 32-bit data units, the first of
which contains the parenthesis number. The rest of the entry is
the corresponding name, zero terminated.
The names are in alphabetical order. If (?| is used to create
multiple groups with the same number, as described in the section
on duplicate subpattern numbers in the pcrepattern
page, the
groups may be given the same name, but there is only one entry in
the table. Different names for groups of the same number are not
permitted. Duplicate names for subpatterns with different
numbers are permitted, but only if PCRE_DUPNAMES is set. They
appear in the table in the order in which they were found in the
pattern. In the absence of (?| this is the order of increasing
number; when (?| is used this is not necessarily the case because
later subpatterns may have lower numbers.
As a simple example of the name/number table, consider the
following pattern after compilation by the 8-bit library (assume
PCRE_EXTENDED is set, so white space - including newlines - is
ignored):
(?<date> (?<year>(\d\d)?\d\d) -
(?<month>\d\d) - (?<day>\d\d) )
There are four named subpatterns, so the table has four entries,
and each entry in the table is eight bytes long. The table is as
follows, with non-printing bytes shows in hexadecimal, and
undefined bytes shown as ??:
00 01 d a t e 00 ??
00 05 d a y 00 ?? ??
00 04 m o n t h 00
00 02 y e a r 00 ??
When writing code to extract data from named subpatterns using
the name-to-number map, remember that the length of the entries
is likely to be different for each compiled pattern.
PCRE_INFO_OKPARTIAL
Return 1 if the pattern can be used for partial matching with
pcre_exec()
, otherwise 0. The fourth argument should point to an
int
variable. From release 8.00, this always returns 1, because
the restrictions that previously applied to partial matching have
been lifted. The pcrepartial
documentation gives details of
partial matching.
PCRE_INFO_OPTIONS
Return a copy of the options with which the pattern was compiled.
The fourth argument should point to an unsigned long int
variable. These option bits are those specified in the call to
pcre_compile()
, modified by any top-level option settings at the
start of the pattern itself. In other words, they are the options
that will be in force when matching starts. For example, if the
pattern /(?im)abc(?-i)d/ is compiled with the PCRE_EXTENDED
option, the result is PCRE_CASELESS, PCRE_MULTILINE, and
PCRE_EXTENDED.
A pattern is automatically anchored by PCRE if all of its top-
level alternatives begin with one of the following:
^ unless PCRE_MULTILINE is set
\A always
\G always
.* if PCRE_DOTALL is set and there are no back
references to the subpattern in which .* appears
For such patterns, the PCRE_ANCHORED bit is set in the options
returned by pcre_fullinfo()
.
PCRE_INFO_RECURSIONLIMIT
If the pattern set a recursion limit by including an item of the
form (*LIMIT_RECURSION=nnnn) at the start, the value is returned.
The fourth argument should point to an unsigned 32-bit integer.
If no such value has been set, the call to pcre_fullinfo()
returns the error PCRE_ERROR_UNSET.
PCRE_INFO_SIZE
Return the size of the compiled pattern in bytes (for all three
libraries). The fourth argument should point to a size_t
variable. This value does not include the size of the pcre
structure that is returned by pcre_compile()
. The value that is
passed as the argument to pcre_malloc()
when pcre_compile()
is
getting memory in which to place the compiled data is the value
returned by this option plus the size of the pcre
structure.
Studying a compiled pattern, with or without JIT, does not alter
the value returned by this option.
PCRE_INFO_STUDYSIZE
Return the size in bytes (for all three libraries) of the data
block pointed to by the study_data field in a pcre_extra
block.
If pcre_extra
is NULL, or there is no study data, zero is
returned. The fourth argument should point to a size_t
variable.
The study_data field is set by pcre_study()
to record information
that will speed up matching (see the section entitled "Studying a
pattern" above). The format of the study_data block is private,
but its length is made available via this option so that it can
be saved and restored (see the pcreprecompile
documentation for
details).
PCRE_INFO_REQUIREDCHARFLAGS
Returns 1 if there is a rightmost literal data unit that must
exist in any matched string, other than at its start. The fourth
argument should point to an int
variable. If there is no such
value, 0 is returned. If returning 1, the character value itself
can be retrieved using PCRE_INFO_REQUIREDCHAR.
For anchored patterns, a last literal value is recorded only if
it follows something of variable length. For example, for the
pattern /^a\d+z\d+/ the returned value 1 (with "z" returned from
PCRE_INFO_REQUIREDCHAR), but for /^a\dz\d/ the returned value is
0.
PCRE_INFO_REQUIREDCHAR
Return the value of the rightmost literal data unit that must
exist in any matched string, other than at its start, if such a
value has been recorded. The fourth argument should point to a
uint32_t
variable. If there is no such value, 0 is returned.