Perl-совместимые регулярные выражения (Perl-compatible regular expressions)
JIT STACK FAQ
(1) Why do we need JIT stacks?
PCRE (and JIT) is a recursive, depth-first engine, so it needs a
stack where the local data of the current node is pushed before
checking its child nodes. Allocating real machine stack on some
platforms is difficult. For example, the stack chain needs to be
updated every time if we extend the stack on PowerPC. Although
it is possible, its updating time overhead decreases performance.
So we do the recursion in memory.
(2) Why don't we simply allocate blocks of memory with malloc()
?
Modern operating systems have a nice feature: they can reserve an
address space instead of allocating memory. We can safely
allocate memory pages inside this address space, so the stack
could grow without moving memory data (this is important because
of pointers). Thus we can allocate 1M address space, and use only
a single memory page (usually 4K) if that is enough. However, we
can still grow up to 1M anytime if needed.
(3) Who "owns" a JIT stack?
The owner of the stack is the user program, not the JIT studied
pattern or anything else. The user program must ensure that if a
stack is used by pcre_exec()
, (that is, it is assigned to the
pattern currently running), that stack must not be used by any
other threads (to avoid overwriting the same memory area). The
best practice for multithreaded programs is to allocate a stack
for each thread, and return this stack through the JIT callback
function.
(4) When should a JIT stack be freed?
You can free a JIT stack at any time, as long as it will not be
used by pcre_exec()
again. When you assign the stack to a
pattern, only a pointer is set. There is no reference counting or
any other magic. You can free the patterns and stacks in any
order, anytime. Just do not call pcre_exec()
with a pattern
pointing to an already freed stack, as that will cause SEGFAULT.
(Also, do not free a stack currently used by pcre_exec()
in
another thread). You can also replace the stack for a pattern at
any time. You can even free the previous stack before assigning a
replacement.
(5) Should I allocate/free a stack every time before/after
calling pcre_exec()
?
No, because this is too costly in terms of resources. However,
you could implement some clever idea which release the stack if
it is not used in let's say two minutes. The JIT callback can
help to achieve this without keeping a list of the currently JIT
studied patterns.
(6) OK, the stack is for long term memory allocation. But what
happens if a pattern causes stack overflow with a stack of 1M? Is
that 1M kept until the stack is freed?
Especially on embedded sytems, it might be a good idea to release
memory sometimes without freeing the stack. There is no API for
this at the moment. Probably a function call which returns with
the currently allocated memory for any stack and another which
allows releasing memory (shrinking the stack) would be a good
idea if someone needs this.
(7) This is too much of a headache. Isn't there any better
solution for JIT stack handling?
No, thanks to Windows. If POSIX threads were used everywhere, we
could throw out this complicated API.