Perl-совместимые регулярные выражения (Perl-compatible regular expressions)
PCRE API OVERVIEW
PCRE has its own native API, which is described in this document.
There are also some wrapper functions (for the 8-bit library
only) that correspond to the POSIX regular expression API, but
they do not give access to all the functionality. They are
described in the pcreposix
documentation. Both of these APIs
define a set of C function calls. A C++ wrapper (again for the
8-bit library only) is also distributed with PCRE. It is
documented in the pcrecpp
page.
The native API C function prototypes are defined in the header
file pcre.h
, and on Unix-like systems the (8-bit) library itself
is called libpcre
. It can normally be accessed by adding -lpcre
to the command for linking an application that uses PCRE. The
header file defines the macros PCRE_MAJOR and PCRE_MINOR to
contain the major and minor release numbers for the library.
Applications can use these to include support for different
releases of PCRE.
In a Windows environment, if you want to statically link an
application program against a non-dll pcre.a
file, you must
define PCRE_STATIC before including pcre.h
or pcrecpp.h
, because
otherwise the pcre_malloc()
and pcre_free()
exported functions
will be declared __declspec(dllimport)
, with unwanted results.
The functions pcre_compile()
, pcre_compile2()
, pcre_study()
, and
pcre_exec()
are used for compiling and matching regular
expressions in a Perl-compatible manner. A sample program that
demonstrates the simplest way of using them is provided in the
file called pcredemo.c in the PCRE source distribution. A listing
of this program is given in the pcredemo
documentation, and the
pcresample
documentation describes how to compile and run it.
Just-in-time compiler support is an optional feature of PCRE that
can be built in appropriate hardware environments. It greatly
speeds up the matching performance of many patterns. Simple
programs can easily request that it be used if available, by
setting an option that is ignored when it is not relevant. More
complicated programs might need to make use of the functions
pcre_jit_stack_alloc()
, pcre_jit_stack_free()
, and
pcre_assign_jit_stack()
in order to control the JIT code's memory
usage.
From release 8.32 there is also a direct interface for JIT
execution, which gives improved performance. The JIT-specific
functions are discussed in the pcrejit
documentation.
A second matching function, pcre_dfa_exec()
, which is not Perl-
compatible, is also provided. This uses a different algorithm for
the matching. The alternative algorithm finds all possible
matches (at a given point in the subject), and scans the subject
just once (unless there are lookbehind assertions). However, this
algorithm does not return captured substrings. A description of
the two matching algorithms and their advantages and
disadvantages is given in the pcrematching
documentation.
In addition to the main compiling and matching functions, there
are convenience functions for extracting captured substrings from
a subject string that is matched by pcre_exec()
. They are:
pcre_copy_substring()
pcre_copy_named_substring()
pcre_get_substring()
pcre_get_named_substring()
pcre_get_substring_list()
pcre_get_stringnumber()
pcre_get_stringtable_entries()
pcre_free_substring()
and pcre_free_substring_list()
are also
provided, to free the memory used for extracted strings.
The function pcre_maketables()
is used to build a set of
character tables in the current locale for passing to
pcre_compile()
, pcre_exec()
, or pcre_dfa_exec()
. This is an
optional facility that is provided for specialist use. Most
commonly, no special tables are passed, in which case internal
tables that are generated when PCRE is built are used.
The function pcre_fullinfo()
is used to find out information
about a compiled pattern. The function pcre_version()
returns a
pointer to a string containing the version of PCRE and its date
of release.
The function pcre_refcount()
maintains a reference count in a
data block containing a compiled pattern. This is provided for
the benefit of object-oriented applications.
The global variables pcre_malloc
and pcre_free
initially contain
the entry points of the standard malloc()
and free()
functions,
respectively. PCRE calls the memory management functions via
these variables, so a calling program can replace them if it
wishes to intercept the calls. This should be done before calling
any PCRE functions.
The global variables pcre_stack_malloc
and pcre_stack_free
are
also indirections to memory management functions. These special
functions are used only when PCRE is compiled to use the heap for
remembering data, instead of recursive function calls, when
running the pcre_exec()
function. See the pcrebuild
documentation
for details of how to do this. It is a non-standard way of
building PCRE, for use in environments that have limited stacks.
Because of the greater use of memory management, it runs more
slowly. Separate functions are provided so that special-purpose
external code can be used for this case. When used, these
functions always allocate memory blocks of the same size. There
is a discussion about PCRE's stack usage in the pcrestack
documentation.
The global variable pcre_callout
initially contains NULL. It can
be set by the caller to a "callout" function, which PCRE will
then call at specified points during a matching operation.
Details are given in the pcrecallout
documentation.
The global variable pcre_stack_guard
initially contains NULL. It
can be set by the caller to a function that is called by PCRE
whenever it starts to compile a parenthesized part of a pattern.
When parentheses are nested, PCRE uses recursive function calls,
which use up the system stack. This function is provided so that
applications with restricted stacks can force a compilation error
if the stack runs out. The function should return zero if all is
well, or non-zero to force an error.