int pcre_dfa_exec(const pcre *
code, const pcre_extra *
extra,
const char *
subject, int
length, int
startoffset,
int
options, int *
ovector, int
ovecsize,
int *
workspace, int
wscount);
The function pcre_dfa_exec()
is called to match a subject string
against a compiled pattern, using a matching algorithm that scans
the subject string just once, and does not backtrack. This has
different characteristics to the normal algorithm, and is not
compatible with Perl. Some of the features of PCRE patterns are
not supported. Nevertheless, there are times when this kind of
matching can be useful. For a discussion of the two matching
algorithms, and a list of features that pcre_dfa_exec()
does not
support, see the pcrematching
documentation.
The arguments for the pcre_dfa_exec()
function are the same as
for pcre_exec()
, plus two extras. The ovector argument is used in
a different way, and this is described below. The other common
arguments are used in the same way as for pcre_exec()
, so their
description is not repeated here.
The two additional arguments provide workspace for the function.
The workspace vector should contain at least 20 elements. It is
used for keeping track of multiple paths through the pattern
tree. More workspace will be needed for patterns and subjects
where there are a lot of potential matches.
Here is an example of a simple call to pcre_dfa_exec()
:
int rc;
int ovector[10];
int wspace[20];
rc = pcre_dfa_exec(
re, /* result of pcre_compile() */
NULL, /* we didn't study the pattern */
"some string", /* the subject string */
11, /* the length of the subject string */
0, /* start at offset 0 in the subject */
0, /* default options */
ovector, /* vector of integers for substring
information */
10, /* number of elements (NOT size in bytes) */
wspace, /* working space vector */
20); /* number of elements (NOT size in bytes) */
Option bits for pcre_dfa_exec()
The unused bits of the options argument for pcre_dfa_exec()
must
be zero. The only bits that may be set are PCRE_ANCHORED,
PCRE_NEWLINE_xxx, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY,
PCRE_NOTEMPTY_ATSTART, PCRE_NO_UTF8_CHECK, PCRE_BSR_ANYCRLF,
PCRE_BSR_UNICODE, PCRE_NO_START_OPTIMIZE, PCRE_PARTIAL_HARD,
PCRE_PARTIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All
but the last four of these are exactly the same as for
pcre_exec()
, so their description is not repeated here.
PCRE_PARTIAL_HARD
PCRE_PARTIAL_SOFT
These have the same general effect as they do for pcre_exec()
,
but the details are slightly different. When PCRE_PARTIAL_HARD is
set for pcre_dfa_exec()
, it returns PCRE_ERROR_PARTIAL if the end
of the subject is reached and there is still at least one
matching possibility that requires additional characters. This
happens even if some complete matches have also been found. When
PCRE_PARTIAL_SOFT is set, the return code PCRE_ERROR_NOMATCH is
converted into PCRE_ERROR_PARTIAL if the end of the subject is
reached, there have been no complete matches, but there is still
at least one matching possibility. The portion of the string that
was inspected when the longest partial match was found is set as
the first matching string in both cases. There is a more
detailed discussion of partial and multi-segment matching, with
examples, in the pcrepartial
documentation.
PCRE_DFA_SHORTEST
Setting the PCRE_DFA_SHORTEST option causes the matching
algorithm to stop as soon as it has found one match. Because of
the way the alternative algorithm works, this is necessarily the
shortest possible match at the first possible matching point in
the subject string.
PCRE_DFA_RESTART
When pcre_dfa_exec()
returns a partial match, it is possible to
call it again, with additional subject characters, and have it
continue with the same match. The PCRE_DFA_RESTART option
requests this action; when it is set, the workspace and wscount
options must reference the same vector as before because data
about the match so far is left in them after a partial match.
There is more discussion of this facility in the pcrepartial
documentation.
Successful returns from pcre_dfa_exec()
When pcre_dfa_exec()
succeeds, it may have matched more than one
substring in the subject. Note, however, that all the matches
from one run of the function start at the same point in the
subject. The shorter matches are all initial substrings of the
longer matches. For example, if the pattern
<.*>
is matched against the string
This is <something> <something else> <something further> no
more
the three matched strings are
<something>
<something> <something else>
<something> <something else> <something further>
On success, the yield of the function is a number greater than
zero, which is the number of matched substrings. The substrings
themselves are returned in ovector. Each string uses two
elements; the first is the offset to the start, and the second is
the offset to the end. In fact, all the strings have the same
start offset. (Space could have been saved by giving this only
once, but it was decided to retain some compatibility with the
way pcre_exec()
returns data, even though the meaning of the
strings is different.)
The strings are returned in reverse order of length; that is, the
longest matching string is given first. If there were too many
matches to fit into ovector, the yield of the function is zero,
and the vector is filled with the longest matches. Unlike
pcre_exec()
, pcre_dfa_exec()
can use the entire ovector for
returning matched strings.
NOTE: PCRE's "auto-possessification" optimization usually applies
to character repeats at the end of a pattern (as well as
internally). For example, the pattern "a\d+" is compiled as if it
were "a\d++" because there is no point even considering the
possibility of backtracking into the repeated digits. For DFA
matching, this means that only one possible match is found. If
you really do want multiple matches in such cases, either use an
ungreedy repeat ("a\d+?") or set the PCRE_NO_AUTO_POSSESS option
when compiling.
Error returns from pcre_dfa_exec()
The pcre_dfa_exec()
function returns a negative number when it
fails. Many of the errors are the same as for pcre_exec()
, and
these are described above. There are in addition the following
errors that are specific to pcre_dfa_exec()
:
PCRE_ERROR_DFA_UITEM (-16)
This return is given if pcre_dfa_exec()
encounters an item in the
pattern that it does not support, for instance, the use of \C or
a back reference.
PCRE_ERROR_DFA_UCOND (-17)
This return is given if pcre_dfa_exec()
encounters a condition
item that uses a back reference for the condition, or a test for
recursion in a specific group. These are not supported.
PCRE_ERROR_DFA_UMLIMIT (-18)
This return is given if pcre_dfa_exec()
is called with an extra
block that contains a setting of the match_limit or
match_limit_recursion fields. This is not supported (these fields
are meaningless for DFA matching).
PCRE_ERROR_DFA_WSSIZE (-19)
This return is given if pcre_dfa_exec()
runs out of space in the
workspace vector.
PCRE_ERROR_DFA_RECURSE (-20)
When a recursive subpattern is processed, the matching function
calls itself recursively, using private vectors for ovector and
workspace. This error is given if the output vector is not large
enough. This should be extremely rare, as a vector of size 1000
is used.
PCRE_ERROR_DFA_BADRESTART (-30)
When pcre_dfa_exec()
is called with the PCRE_DFA_RESTART
option,
some plausibility checks are made on the contents of the
workspace, which should contain data about the previous partial
match. If any of these checks fail, this error is given.