Путеводитель по Руководству Linux

  User  |  Syst  |  Libr  |  Device  |  Files  |  Other  |  Admin  |  Head  |



   gendict    ( 1 )

компилирует список слов в словарь строк ICU (Compiles word list into ICU string trie dictionary)

Имя (Name)

gendict - Compiles word list into ICU string trie dictionary

Синопсис (Synopsis)

gendict [ --uchars | --bytes --transform transform ] [ -h, -?,
       --help ] [ -V, --version ] [ -c, --copyright ] [ -v, --verbose ]
       [ -i, --icudatadir directory ]  input-file  output-file

Описание (Description)

gendict reads the word list from dictionary-file and creates a
       string trie dictionary file. Normally this data file has the
       .dict extension.

Words begin at the beginning of a line and are terminated by the first whitespace. Lines that begin with whitespace are ignored.


Параметры (Options)

-h, -?, --help
              Print help about usage and exit.

-V, --version Print the version of gendict and exit.

-c, --copyright Embeds the standard ICU copyright into the output-file.

-v, --verbose Display extra informative messages during execution.

-i, --icudatadir directory Look for any necessary ICU data files in directory. For example, the file pnames.icu must be located when ICU's data is not built as a shared library. The default ICU data directory is specified by the environment variable ICU_DATA. Most configurations of ICU do not require this argument.

--uchars Set the output trie type to UChar. Mutually exclusive with --bytes.

--bytes Set the output trie type to Bytes. Mutually exclusive with --uchars.

--transform Set the transform type. Should only be specified with --bytes. Currently supported transforms are: offset-<hex- number>, which specifies an offset to subtract from all input characters. It should be noted that the offset transform also maps U+200D to 0xFF and U+200C to 0xFE, in order to offer compatibility to languages that require these characters. A transform must be specified for a bytes trie, and when applied to the non-value characters in the input-file must produce output between 0x00 and 0xFF.

input-file The source file to read.

output-file The file to write the output dictionary to.


Предостережение (Caveat)

The input-file is assumed to be encoded in UTF-8.  The integers
       in the input-file that are used as values must be made up of
       ASCII digits. They may be specified either in hex, by using a 0x
       prefix, or in decimal.  Either --bytes or --uchars must be
       specified.

Окружение (Environment)

ICU_DATA
              Specifies the directory containing ICU data. Defaults to
              ${prefix}/share/icu/70.0.1/.  Some tools in ICU depend on
              the presence of the trailing slash. It is thus important
              to make sure that it is present if ICU_DATA is set.