pcre_extra *pcre_study(const pcre *
code, int
options,
const char **
errptr);
If a compiled pattern is going to be used several times, it is
worth spending more time analyzing it in order to speed up the
time taken for matching. The function pcre_study()
takes a
pointer to a compiled pattern as its first argument. If studying
the pattern produces additional information that will help speed
up matching, pcre_study()
returns a pointer to a pcre_extra
block, in which the study_data field points to the results of the
study.
The returned value from pcre_study()
can be passed directly to
pcre_exec()
or pcre_dfa_exec()
. However, a pcre_extra
block also
contains other fields that can be set by the caller before the
block is passed; these are described below in the section on
matching a pattern.
If studying the pattern does not produce any useful information,
pcre_study()
returns NULL by default. In that circumstance, if
the calling program wants to pass any of the other fields to
pcre_exec()
or pcre_dfa_exec()
, it must set up its own pcre_extra
block. However, if pcre_study()
is called with the
PCRE_STUDY_EXTRA_NEEDED option, it returns a pcre_extra
block
even if studying did not find any additional information. It may
still return NULL, however, if an error occurs in pcre_study()
.
The second argument of pcre_study()
contains option bits. There
are three further options in addition to PCRE_STUDY_EXTRA_NEEDED:
PCRE_STUDY_JIT_COMPILE
PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
If any of these are set, and the just-in-time compiler is
available, the pattern is further compiled into machine code that
executes much faster than the pcre_exec()
interpretive matching
function. If the just-in-time compiler is not available, these
options are ignored. All undefined bits in the options argument
must be zero.
JIT compilation is a heavyweight optimization. It can take some
time for patterns to be analyzed, and for one-off matches and
simple patterns the benefit of faster execution might be offset
by a much slower study time. Not all patterns can be optimized
by the JIT compiler. For those that cannot be handled, matching
automatically falls back to the pcre_exec()
interpreter. For more
details, see the pcrejit
documentation.
The third argument for pcre_study()
is a pointer for an error
message. If studying succeeds (even if no data is returned), the
variable it points to is set to NULL. Otherwise it is set to
point to a textual error message. This is a static string that is
part of the library. You must not try to free it. You should test
the error pointer for NULL after calling pcre_study()
, to be sure
that it has run successfully.
When you are finished with a pattern, you can free the memory
used for the study data by calling pcre_free_study()
. This
function was added to the API for release 8.20. For earlier
versions, the memory could be freed with pcre_free()
, just like
the pattern itself. This will still work in cases where JIT
optimization is not used, but it is advisable to change to the
new function when convenient.
This is a typical way in which pcre_study
() is used (except that
in a real application there should be tests for errors):
int rc;
pcre *re;
pcre_extra *sd;
re = pcre_compile("pattern", 0, &error, &erroroffset, NULL);
sd = pcre_study(
re, /* result of pcre_compile() */
0, /* no options */
&error); /* set to NULL or points to a message */
rc = pcre_exec( /* see below for details of pcre_exec()
options */
re, sd, "subject", 7, 0, 0, ovector, 30);
...
pcre_free_study(sd);
pcre_free(re);
Studying a pattern does two things: first, a lower bound for the
length of subject string that is needed to match the pattern is
computed. This does not mean that there are any strings of that
length that match, but it does guarantee that no shorter strings
match. The value is used to avoid wasting time by trying to match
strings that are shorter than the lower bound. You can find out
the value in a calling program via the pcre_fullinfo()
function.
Studying a pattern is also useful for non-anchored patterns that
do not have a single fixed starting character. A bitmap of
possible starting bytes is created. This speeds up finding a
position in the subject at which to start matching. (In 16-bit
mode, the bitmap is used for 16-bit values less than 256. In
32-bit mode, the bitmap is used for 32-bit values less than 256.)
These two optimizations apply to both pcre_exec()
and
pcre_dfa_exec()
, and the information is also used by the JIT
compiler. The optimizations can be disabled by setting the
PCRE_NO_START_OPTIMIZE option. You might want to do this if your
pattern contains callouts or (*MARK) and you want to make use of
these facilities in cases where matching fails.
PCRE_NO_START_OPTIMIZE can be specified at either compile time or
execution time. However, if PCRE_NO_START_OPTIMIZE is passed to
pcre_exec()
, (that is, after any JIT compilation has happened)
JIT execution is disabled. For JIT execution to work with
PCRE_NO_START_OPTIMIZE, the option must be set at compile time.
There is a longer discussion of PCRE_NO_START_OPTIMIZE below.