Команды SFK


    1        2        3        4        5        6        7        8        9        10    

Раздел 4. Text Processing - Обработка текста
addhead | addtail | count | difflines | filter | head | joinlines | linelen | load | ofilter | perline | printloop | replace | run | runloop | snapto | sort | strings | tail | xed | xex | xreplace |

Help:   Рус   |   Eng        

Команда: xreplace
sfk xreplace dirName "/searchtext/totext/"

   replace in text and binary files using wildcards * and ?
   as well as SFK Simple Expressions in brackets [].

   demo notice
      this command is commercial and part of SFK Plus or XE.
      this binary contains a demo of xreplace that gives a full
      output preview but which cannot write changes to files.

   Multiple search patterns are executed in the given sequence. Mind this
   if they overlap, e.g. /foo/bar/ /foosys/thesys/ makes no sense (foo is
   replaced by the first expression, so the 2nd one will fail to match).

   by default, replace functions run in SIMULATION mode,
      previewing hits without changing anything. add -yes to apply changes.
      Changing binaries may lead to unpredictable results, therefore keep
      backups of your files in any case.

   subdirectories are included by default
      the sfk default for most commands is to process the given directories,
      as well as all subdirs within them. specify -nosub to disable this.

   options
      -nosub        do not include files in subdirectories.
      -nobin[ary]   skip binary files.
      -case         case-sensitive text comparison. default is insensitive.
                    for details type: sfk help nocase
      -pat          starts a list of search or replace patterns of the form
                    xsrcxdstx where x is the separator char, src the source
                    to search for, and dst the destination to replace it with.
                    e.g. /foo/bar/ or _foo_bar_ both replace foo by bar.
                    -pat is not required if a single filename is given.
      -text         the same as -pat, starting a text pattern list.
      -bylist x.txt read search patterns from a file x.txt, supporting
                    multiple lines per pattern. (add -full for more.)
      -bylinelist x read /from/to/ or just /from/ patterns from a file x
                    with one pattern per line. (add -full for more.)
                    -by(line)list does not support sfk variables.
                    to use variables in patterns create an sfk script
                    with patterns as parameters. "sfk script" for more.
      -recsize      with same length replacements: set input record size
                    for processing (default=100k)
      -firsthit     process only first found pattern match per file.
      -quiet        do not show progress infos.
      -stat         show statistics like hits per pattern and no. of files.
      -perf         show performance statistics.
      -memlimit=nm  with different-length replacements, files must be loaded
                    into memory for processing. the default limit for memory
                    use is 300 MB. set -memlimit=500m to select 500 MB.
      -full         print full help text telling about -bylist pattern files,
                    special character case sensitivity and nested or repeated
                    replace behaviour.

   output options
      -dump         create hexdump of search hits or replaced text.
       -wide        with -dump: show 16 bytes per line.
       -lean        with -dump: show  8 bytes per line.
      -dumpfrom     always dump search hits but not replaced text.
      -dumpall      dump search text and replaced text.
      -nodump       do not create a hexdump, list only matching files.
      -astext       no hexdump, but print search hits as plain text.
                    use this only with plain text files, not binary.
      -showle       highlight CR/LF line endings in hex dump output
      -context=n    with hexdump: show additional n bytes of context.
      -reldist      with hexdump: tell relative distances to previous hits.
      -to dir\$file write output files to given path. for details about
                    output file masks, type "sfk help opt" or "sfk run".
      -tofile x     write output data to a single output filename x
                    (which is not interpreted as a mask but taken as is).
      -more[n]      pause output every 30 or n lines.

   return codes for batch files
      0 = no matches, 1 = matches found, >1 = major error occurred.
      see also "sfk help opt" on how to influence error processing.

   about nested replacement patterns
      sfk replace myfile.dat /foo/bar/ /bar/goo/
      with SFK base, "foo" will be replaced by "bar" and then
      immediately "bar" is replaced again by "goo".
      with SFK Plus or XE, a replaced part of text is not replaced
      again in the same command, so "foo" stays replaced by "bar".

   unexpected repeat replace behaviour
      depending on the input data and search/replace expressions,
      it can happen that running the same replace multiple times
      on the same file produces further hits that didn't exist
      in the first run. add option -full to read more on this.

   quoted multi line parameters are supported in scripts
      using full trim. type "sfk script" for details.

   wildcards and SFK expressions
      SFK Expressions are simple patterns containing literal text,
      wildcards * and ? and character classes in square brackets [].
      basically, the syntax provides extended wilcards but no
      further logic and is not related to regular expressions.

      search patterns are surrounded by a separator character which
      can be anything not contained in the search text, like / or _

      within a pattern /fromtext/totext/ the fromtext may contain:

        *                       - 0 to 4000 characters in the same
                                  text line or paragraph, i.e. all
                                  bytes not being CR, LF or NULL.
                                  4000 is just a default maximum
                                  that can be changed by:
        [0.100000 chars]        - 0 to 100000 characters in the same
                                  text line or paragraph, i.e. the
                                  same as * but with a larger range.
        ?                       - one character.
        ?????                   - same as [5.5 chars] or [5 chars]
        [bytes]                 - 0 to 4000 bytes (with CR,LF,NULL)
                                  i.e. it collects stream text
                                  across lines, even in binary data
        **                      - the same as [bytes].
        [0.100 bytes]           - 0 to 100 bytes
        [.100000 bytes]         - up to 100000 bytes
        [1.* bytes]             - 1 to default maximum bytes
        [2 chars]               - exactly 2 chars
        [30 bytes]              - exactly 30 bytes
        [byte of aeiou]         - one vocal (a OR A OR e OR ...),
                                  case insensitive by default.
                                  "aeiou" is a character list.
        [byte of \\\x2f]        - a backslash \ or forw. slash /
        [bytes of \r\n \t]      - whitespace incl. line ends
        [bytes of (\r\n \t)]    - the same, () are optional
        [bytes not \r\n\0]      - up to 4000 bytes as long as no
                                  CR, LF or NULL byte appears
        [chars]                 - the same as [bytes not \r\n\0],
                                  i.e. collect text in a line
        [char not ( \t)]        - same as [byte not ( \r\n\0\t)],
                                  everything not blanks and tabs
        [char not )( \t]        - not brackets, blanks and tabs,
                                  same as not (\(\) \t)
        [chars of a-z0-9]       - means a-zA-Z0-9 as search is
                                  case insensitive by default
        [chars of \x61-\x7A]    - search a-z but not A-Z, or use
                                  option -case for case search
        [eol]                   - end of line by characters:
                                  CRLF or LF or CR

        [white]     = chars of (\t )     - 0 or more whitespaces
        [xwhite]    = bytes of (\t \r\n) - same but across lines
        [1 white]   = byte  of (\t )     - 1 whitespace
        [digit]     = byte  of (0-9)     - 1 digit
        [digits]    = bytes of (0-9)     - 0 or more digits
        [hexdigit]  = byte  of (0-9a-f)  - 1 hexadecimal digit
        [hexdigits]  = bytes of (0-9a-f) - 0 or more hex digits

        special keywords that do not count as tokens:
        [skip]   - at the start of a pattern: skip such text
                   completely, do not count it as a search hit.
        [keep]   - search also the following text but keep it
                   in the input data, without consuming it.
        [ortext] - foo[ortext]bar searches word foo or bar.
                   [ortext] is allowed only between literals.

        anchors that have no length of their own:
        [start]  - start of file
        [end]    - end of file
        [lstart] - line start, i.e. start or CRLF or CR or LF
        [lend]   - logical line end, i.e. eol or end of file.
                   to replace line ends use [eol] instead.

        how to search or replace special characters:
        -  to search or replace text containing the literal characters
           * ? \ [ ] then these must be escaped like \* \? \\ \[ \]
        -  ( ) are escaped only within character lists, like \( \)
        -  to search or replace the forward slash '/' type \x2f or use
           another char around from/to text, e.g. _fromtext_totext_
        -  parameters with blanks and non trivial characters need double
           quotes "", see also "about Shell Command Characters" below.

        expansion priorities: (highest first)
        if two search parts are side by side, and the same input
        character matches both, then these priorities apply:

          5:  start, end, lstart, lend
          4:  literal text, eol
          3:  whitelist classes: byte of, bytes of
          2:  blacklist classes: chars not, bytes not
          1:  plain wildcards: ?, *, **, byte, bytes, chars

        this means in "/[bytes]foo/" the [bytes] will stop to collect
        characters as soon as "foo" is found, as "foo" is a literal.
        on same or higher priority the right side stops the left side.

      the totext may contain:

        [part 1]            use first text part of the fromtext.
                            e.g. the fromtext /*foo[.100 chars]bar*/
                            contains parts :   1 2         3    4 5
        [part1]             the same (blank is optional).
        [parts 1,2,3]       use parts 1, 2 and 3.
        [parts 1-10]        use parts 1 to 10.
        [strip(part1,\0)]   use part 1 but remove zero bytes.
                            only zero bytes "\0" can be removed.
        [file.name]         full input filename with path
        [file.relname]      input filename without path
        [file.path]         input file's path
        [file.base]         relname without last .extension
        [file.ext]          input filename extension
        [all]               use all parts from fromtext.

        [setvar name]...[endvar]   set variable "name" with data
                                   between setvar and endvar.
        [getvar name]              fill in data from variable "name"

        although anchors like lstart, lend count as a separate part
        they need NOT be specified in the totext. this means that
        /[lstart]foo[lend]/bar/ just changes the word "foo".

   supported slash patterns
      \t    = TAB
      \r    = CR
      \n    = LF
      \x00  = one byte with code 00 hexadecimal
      \0    = short form for \x00
      \q    = a double quote "
      \\    = the backslash character \ itself
      \[    = the bracket open character [
      \]    = the bracket close character ]
      \*    = the literal star character *
      \?    = the literal question mark  ?
      \-    = to use literal "-" in a command
      Within multi line -bylist files:
      \     = slash+blank is changed to a single blank
      Only within "char of" or "byte not" lists:
      \(    = to use literal character "("
      \)    = to use literal character ")"

   SFK expression options
      -showpart(s)  print /from/ part numbers, range statistics
                    and expansion priority points per part.
                    done automatically if a required /to/ text
                    is not given with a command.
      -showbest     if a /from/ pattern finds nothing, use this to
                    see how many parts would match so far, and with
                    up to how many bytes per part. anchors like [lstart]
                    may show a non zero length when matching (CR)LF.
      -showlist     with -bylist, show the internal joined list if
                    commands are spread across multiple lines.
      -showall      show all of the above.
      -xmaxlen=n    set default maximum length for chars or bytes commands,
                    e.g. -xmaxlen=10000 means /foo*bar/ matches with up to
                    10000 characters between foo and bar. the default max
                    length without this option is 4000 characters.

   performance notes
    - always use a string literal, or single byte or char, at the start
      of your search expressions, like in /foo*bar/ starting with 'f'.
      Do not use a wildcard like * at the start like in /*foobar/
      when searching huge input data, as your search will slow down by
      factor 256. Use /[lstart]*foobar/ instead.
    - the system may cache output file(s), writing to disk in background
      after sfk has finished. subsequent batch commands may execute slower.

   office file support
      sfk ofind        search in .xml text file contents of
                       office files like .docx .xlsx .ods .odt.
      sfk help office  for more infos and options

   see also
      --- open source commands ---
      sfk xfind     search  wildcard text in   plain text files
      sfk ofind     search  in office files    .docx .xlsx .ods
      sfk xfindbin  search  wildcard text in   text/binary files
      sfk xhexfind  search  in text/binary with hex dump output
      sfk extract   extract wildcard data from text/binary files
      sfk filter    filter  and edit text with simple wildcards
      sfk find      search  fixed    text in   text        files
      sfk findbin   search  fixed    text in   text/binary files
      sfk hexfind   search  fixed    text in        binary files
      sfk replace   replace fixed    text in   text/binary files
      --- freeware commands ---
      sfk view      GUI tool to search text as you type
      --- xe commercial commands ---
      sfk replace   replace fixed    text with high performance
      sfk xreplace  replace wildcard text in   text/binary files
      sfk help xe   about SFK XE and xreplace with SFK Expressions.

   beware of Shell Command Characters.
      to find or replace text patterns containing spaces or special
      characters like <>|!&?* you must add quotes "" around parameters
      or the shell environment will destroy your command. for example,
      pattern /foo bar/other/ must be written like "/foo bar/other/"
      within a .bat or .cmd file the percent % must be escaped like %%
      even within quotes: sfk echo -spat "percent %% is a percent \x25"

   about example numbers with [brackets]
      if you see [1] type "sfk cmd 1" for whole command in one line.

   bad examples with corrections
      if input text contains:
         bool bClFoo;
         bool bClBar   ;
      sfk xfind in.txt "/bool[xwhite]bCl*[xwhite];/"
         does NOT match "bool bClFoo;" because * eats the
         whole input line including ";" so no input is left
         for "[xwhite];" and the whole expression fails.
      sfk xfind in.txt "/bool[xwhite]bCl[* not ;][xwhite];/"
         does both match "bool bClFoo;" and "bool bClBar   ;".
         this means whenever your search fails to work write
         in detail which characters (not) to collect where.
      sfk xex in.txt "/[lstart]foo/[lstart]goo/"
         there is no need to write an anchor like [lstart]
         within totext as it contains no data. use instead:
            sfk xex in.txt "/[lstart]foo/goo/"
      sfk xex in.txt "/foo[lend]bar/goo[part2]bar/"
         anchors like [lend] must be at start or end of fromtext
         and cannot be referenced within totext. use instead:
            sfk xex in.txt "/foo[eol]bar/goo[part2]bar/"

   working examples
      sfk xrep mydir "/foo*bar/"
         an incomplete command (missing "totext" part in pattern).
         sfk shows an info text telling about part numbers
         and runs a search for "foo*bar" in all files of mydir.
         nothing is changed so far.
      sfk xrep mydir "/foo*bar/[part1]goo[part3]/"
         same as above, but now the /fromtext/totext/ is complete.
         again sfk runs a search for "foo*bar", but now it displays
         the changed output text (totext), with everything between
         "foo" and "bar" being changed to "goo". add option
         -dumpfrom to display the original found text instead.
      sfk sel mydir .txt +xrep "/foo*bar/[part1]goo[part3]/"
         similar to above, replace in all .txt files of mydir.
      sfk xrep -text "/class* CFoo/[part1][part3]/" -dir mydir -file .hpp
         search only .hpp files within mydir, and replace for example
         "class IMPORT CFoo" by "class CFoo".
      sfk xrep -pat "/[byte not \n][end]/[part1]\n/"
       -dir mydir -file .cpp .hpp -dumpall
         find all .cpp or .hpp files in mydir whose last line is not
         ending with a linefeed, and add the linefeed. to check exactly
         what is changed dump both input and output text. [23]
      sfk xrep -dir mydir -file .hpp -enddir
       -text "/[byte not \n][end]/[part1]\n/" -dumpall
         same as above but with dir parameters first. [25]
      sfk xrep io.txt "/[lstart][20 chars]*/[part3]/"
         cut first 20 characters in every line of io.txt.
      sfk xrep io.txt "/[lstart][9 bytes]1001*/[part2]9009[part4]/"
         in fixed position text file data like:
            rec. 001:5318 aef3 2751 1001
            rec. 002:1001 aef5 275a 1001
            rec. 003:ef49 aef7 2763 1001
         replace "1001" where it appears in columns 10 to 13,
         in this example only the first "1001" in record 2.
      sfk xrep in.dat "/\xFF\xFE[1 byte]\x80\x81/\xFF\xFE\x00\x80\x81/"
         replace byte sequences (not ASCII text strings) in binary data.
         searches byte groups starting with values 0xFF 0xFE, then any
         single byte, then 0x80 0x81, and replaces the variable byte
         by always a binary 0x00 value.
      sfk xreplace in.txt "/foo*bar/other/"
         replace phrases starting with "foo" and ending with "bar"
         by word "other" in single file in.txt
      sfk xreplace -text "/foo*bar/===[part2]===/" -dir mydir -file .txt
         replace foo*bar in all .txt files of folder mydir
         with a new pattern containing the text between foo and bar
         surrounded by "===".
      sfk xrep -text "/\x66\x6f\x6f[0.100 bytes]\x62\x61\x72/---/"
       -dir mydir -file .dat
         replace binary data starting with bytes 0x66, 0x6f, 0x6f,
         ending with 0x62, 0x61, 0x72 and up to 100 bytes inbetween
         by "---" within all .dat files of folder mydir. [24]