Путеводитель по Руководству Linux

  User  |  Syst  |  Libr  |  Device  |  Files  |  Other  |  Admin  |  Head  |



   pcreapi    ( 3 )

Perl-совместимые регулярные выражения (Perl-compatible regular expressions)

  Name  |  Pcre native api basic functions  |  Pcre native api string extraction functions  |  Pcre native api auxiliary functions  |  Pcre native api indirected functions  |  Pcre 8-bit, 16-bit, and 32-bit libraries  |  Pcre api overview  |  Newlines  |  Multithreading  |  Saving precompiled patterns for later use  |  Checking build-time options  |  Compiling a pattern  |  Compilation error codes  |  Studying a pattern  |    Locale support    |  Information about a pattern  |  Reference counts  |  Matching a pattern: the traditional function  |  Extracting captured substrings by number  |  Extracting captured substrings by name  |  Duplicate subpattern names  |  Finding all possible matches  |  Obtaining an estimate of stack usage  |  Matching a pattern: the alternative function  |  See also  |

LOCALE SUPPORT

PCRE handles caseless matching, and determines whether characters are letters, digits, or whatever, by reference to a set of tables, indexed by character code point. When running in UTF-8 mode, or in the 16- or 32-bit libraries, this applies only to characters with code points less than 256. By default, higher- valued code points never match escapes such as \w or \d. However, if PCRE is built with Unicode property support, all characters can be tested with \p and \P, or, alternatively, the PCRE_UCP option can be set when a pattern is compiled; this causes \w and friends to use Unicode property support instead of the built-in tables.

The use of locales with Unicode is discouraged. If you are handling characters with code points greater than 128, you should either use Unicode support, or use locales, but not try to mix the two.

PCRE contains an internal set of tables that are used when the final argument of pcre_compile() is NULL. These are sufficient for many applications. Normally, the internal tables recognize only ASCII characters. However, when PCRE is built, it is possible to cause the internal tables to be rebuilt in the default "C" locale of the local system, which may cause them to be different.

The internal tables can always be overridden by tables supplied by the application that calls PCRE. These may be created in a different locale from the default. As more and more applications change to using Unicode, the need for this locale support is expected to die away.

External tables are built by calling the pcre_maketables() function, which has no arguments, in the relevant locale. The result can then be passed to pcre_compile() as often as necessary. For example, to build and use tables that are appropriate for the French locale (where accented characters with values greater than 128 are treated as letters), the following code could be used:

setlocale(LC_CTYPE, "fr_FR"); tables = pcre_maketables(); re = pcre_compile(..., tables);

The locale name "fr_FR" is used on Linux and other Unix-like systems; if you are using Windows, the name for the French locale is "french".

When pcre_maketables() runs, the tables are built in memory that is obtained via pcre_malloc. It is the caller's responsibility to ensure that the memory containing the tables remains available for as long as it is needed.

The pointer that is passed to pcre_compile() is saved with the compiled pattern, and the same tables are used via this pointer by pcre_study() and also by pcre_exec() and pcre_dfa_exec(). Thus, for any single pattern, compilation, studying and matching all happen in the same locale, but different patterns can be processed in different locales.

It is possible to pass a table pointer or NULL (indicating the use of the internal tables) to pcre_exec() or pcre_dfa_exec() (see the discussion below in the section on matching a pattern). This facility is provided for use with pre-compiled patterns that have been saved and reloaded. Character tables are not saved with patterns, so if a non-standard table was used at compile time, it must be provided again when the reloaded pattern is matched. Attempting to use this facility to match a pattern in a different locale from the one in which it was compiled is likely to lead to anomalous (usually incorrect) results.