Путеводитель по Руководству Linux

  User  |  Syst  |  Libr  |  Device  |  Files  |  Other  |  Admin  |  Head  |



   pcreapi    ( 3 )

Perl-совместимые регулярные выражения (Perl-compatible regular expressions)

  Name  |  Pcre native api basic functions  |  Pcre native api string extraction functions  |  Pcre native api auxiliary functions  |  Pcre native api indirected functions  |  Pcre 8-bit, 16-bit, and 32-bit libraries  |  Pcre api overview  |  Newlines  |  Multithreading  |  Saving precompiled patterns for later use  |  Checking build-time options  |  Compiling a pattern  |  Compilation error codes  |    Studying a pattern    |  Locale support  |  Information about a pattern  |  Reference counts  |  Matching a pattern: the traditional function  |  Extracting captured substrings by number  |  Extracting captured substrings by name  |  Duplicate subpattern names  |  Finding all possible matches  |  Obtaining an estimate of stack usage  |  Matching a pattern: the alternative function  |  See also  |

STUDYING A PATTERN

pcre_extra *pcre_study(const pcre *code, int options, const char **errptr);

If a compiled pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. The function pcre_study() takes a pointer to a compiled pattern as its first argument. If studying the pattern produces additional information that will help speed up matching, pcre_study() returns a pointer to a pcre_extra block, in which the study_data field points to the results of the study.

The returned value from pcre_study() can be passed directly to pcre_exec() or pcre_dfa_exec(). However, a pcre_extra block also contains other fields that can be set by the caller before the block is passed; these are described below in the section on matching a pattern.

If studying the pattern does not produce any useful information, pcre_study() returns NULL by default. In that circumstance, if the calling program wants to pass any of the other fields to pcre_exec() or pcre_dfa_exec(), it must set up its own pcre_extra block. However, if pcre_study() is called with the PCRE_STUDY_EXTRA_NEEDED option, it returns a pcre_extra block even if studying did not find any additional information. It may still return NULL, however, if an error occurs in pcre_study().

The second argument of pcre_study() contains option bits. There are three further options in addition to PCRE_STUDY_EXTRA_NEEDED:

PCRE_STUDY_JIT_COMPILE PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE

If any of these are set, and the just-in-time compiler is available, the pattern is further compiled into machine code that executes much faster than the pcre_exec() interpretive matching function. If the just-in-time compiler is not available, these options are ignored. All undefined bits in the options argument must be zero.

JIT compilation is a heavyweight optimization. It can take some time for patterns to be analyzed, and for one-off matches and simple patterns the benefit of faster execution might be offset by a much slower study time. Not all patterns can be optimized by the JIT compiler. For those that cannot be handled, matching automatically falls back to the pcre_exec() interpreter. For more details, see the pcrejit documentation.

The third argument for pcre_study() is a pointer for an error message. If studying succeeds (even if no data is returned), the variable it points to is set to NULL. Otherwise it is set to point to a textual error message. This is a static string that is part of the library. You must not try to free it. You should test the error pointer for NULL after calling pcre_study(), to be sure that it has run successfully.

When you are finished with a pattern, you can free the memory used for the study data by calling pcre_free_study(). This function was added to the API for release 8.20. For earlier versions, the memory could be freed with pcre_free(), just like the pattern itself. This will still work in cases where JIT optimization is not used, but it is advisable to change to the new function when convenient.

This is a typical way in which pcre_study() is used (except that in a real application there should be tests for errors):

int rc; pcre *re; pcre_extra *sd; re = pcre_compile("pattern", 0, &error, &erroroffset, NULL); sd = pcre_study( re, /* result of pcre_compile() */ 0, /* no options */ &error); /* set to NULL or points to a message */ rc = pcre_exec( /* see below for details of pcre_exec() options */ re, sd, "subject", 7, 0, 0, ovector, 30); ... pcre_free_study(sd); pcre_free(re);

Studying a pattern does two things: first, a lower bound for the length of subject string that is needed to match the pattern is computed. This does not mean that there are any strings of that length that match, but it does guarantee that no shorter strings match. The value is used to avoid wasting time by trying to match strings that are shorter than the lower bound. You can find out the value in a calling program via the pcre_fullinfo() function.

Studying a pattern is also useful for non-anchored patterns that do not have a single fixed starting character. A bitmap of possible starting bytes is created. This speeds up finding a position in the subject at which to start matching. (In 16-bit mode, the bitmap is used for 16-bit values less than 256. In 32-bit mode, the bitmap is used for 32-bit values less than 256.)

These two optimizations apply to both pcre_exec() and pcre_dfa_exec(), and the information is also used by the JIT compiler. The optimizations can be disabled by setting the PCRE_NO_START_OPTIMIZE option. You might want to do this if your pattern contains callouts or (*MARK) and you want to make use of these facilities in cases where matching fails.

PCRE_NO_START_OPTIMIZE can be specified at either compile time or execution time. However, if PCRE_NO_START_OPTIMIZE is passed to pcre_exec(), (that is, after any JIT compilation has happened) JIT execution is disabled. For JIT execution to work with PCRE_NO_START_OPTIMIZE, the option must be set at compile time.

There is a longer discussion of PCRE_NO_START_OPTIMIZE below.