Путеводитель по Руководству Linux

  User  |  Syst  |  Libr  |  Device  |  Files  |  Other  |  Admin  |  Head  |



   lex.1p    ( 1 )

генерировать программы для лексических задач (РАЗРАБОТКА) (generate programs for lexical tasks (DEVELOPMENT))

Обоснование (Rationale)

Even though the -c option and references to the C language are retained in this description, lex may be generalized to other languages, as was done at one time for EFL, the Extended FORTRAN Language. Since the lex input specification is essentially language-independent, versions of this utility could be written to produce Ada, Modula-2, or Pascal code, and there are known historical implementations that do so.

The current description of lex bypasses the issue of dealing with internationalized EREs in the lex source code or generated lexical analyzer. If it follows the model used by awk (the source code is assumed to be presented in the POSIX locale, but input and output are in the locale specified by the environment variables), then the tables in the lexical analyzer produced by lex would interpret EREs specified in the lex source in terms of the environment variables specified when lex was executed. The desired effect would be to have the lexical analyzer interpret the EREs given in the lex source according to the environment specified when the lexical analyzer is executed, but this is not possible with the current lex technology.

The description of octal and hexadecimal-digit escape sequences agrees with the ISO C standard use of escape sequences.

Earlier versions of this standard allowed for implementations with bytes other than eight bits, but this has been modified in this version.

There is no detailed output format specification. The observed behavior of lex under four different historical implementations was that none of these implementations consistently reported the line numbers for error and warning messages. Furthermore, there was a desire that lex be allowed to output additional diagnostic messages. Leaving message formats unspecified avoids these formatting questions and problems with internationalization.

Although the %x specifier for exclusive start conditions is not historical practice, it is believed to be a minor change to historical implementations and greatly enhances the usability of lex programs since it permits an application to obtain the expected functionality with fewer statements.

The %array and %pointer declarations were added as a compromise between historical systems. The System V-based lex copies the matched text to a yytext array. The flex program, supported in BSD and GNU systems, uses a pointer. In the latter case, significant performance improvements are available for some scanners. Most historical programs should require no change in porting from one system to another because the string being referenced is null-terminated in both cases. (The method used by flex in its case is to null-terminate the token in place by remembering the character that used to come right after the token and replacing it before continuing on to the next scan.) Multi- file programs with external references to yytext outside the scanner source file should continue to operate on their historical systems, but would require one of the new declarations to be considered strictly portable.

The description of EREs avoids unnecessary duplication of ERE details because their meanings within a lex ERE are the same as that for the ERE in this volume of POSIX.1‐2017.

The reason for the undefined condition associated with text beginning with a <blank> or within "%{" and "%}" delimiter lines appearing in the Rules section is historical practice. Both the BSD and System V lex copy the indented (or enclosed) input in the Rules section (except at the beginning) to unreachable areas of the yylex() function (the code is written directly after a break statement). In some cases, the System V lex generates an error message or a syntax error, depending on the form of indented input.

The intention in breaking the list of functions into those that may appear in lex.yy.c versus those that only appear in libl.a is that only those functions in libl.a can be reliably redefined by a conforming application.

The descriptions of standard output and standard error are somewhat complicated because historical lex implementations chose to issue diagnostic messages to standard output (unless -t was given). POSIX.1‐2008 allows this behavior, but leaves an opening for the more expected behavior of using standard error for diagnostics. Also, the System V behavior of writing the statistics when any table sizes are given is allowed, while BSD- derived systems can avoid it. The programmer can always precisely obtain the desired results by using either the -t or -n options.

The OPERANDS section does not mention the use of - as a synonym for standard input; not all historical implementations support such usage for any of the file operands.

A description of the translation table was deleted from early proposals because of its relatively low usage in historical applications.

The change to the definition of the input() function that allows buffering of input presents the opportunity for major performance gains in some applications.

The following examples clarify the differences between lex regular expressions and regular expressions appearing elsewhere in this volume of POSIX.1‐2017. For regular expressions of the form "r/x", the string matching r is always returned; confusion may arise when the beginning of x matches the trailing portion of r. For example, given the regular expression "a*b/cc" and the input "aaabcc", yytext would contain the string "aaab" on this match. But given the regular expression "x*/xy" and the input "xxxy", the token xxx, not xx, is returned by some implementations because xxx matches "x*".

In the rule "ab*/bc", the "b*" at the end of r extends r's match into the beginning of the trailing context, so the result is unspecified. If this rule were "ab/bc", however, the rule matches the text "ab" when it is followed by the text "bc". In this latter case, the matching of r cannot extend into the beginning of x, so the result is specified.