Команды SFK


    1        2        3        4        5        6        7        8        9        10    

Раздел 4. Text Processing - Обработка текста
addhead | addtail | count | difflines | filter | head | joinlines | linelen | load | ofilter | perline | printloop | replace | run | runloop | snapto | sort | strings | tail | xed | xex | xreplace |

Help:   Рус   |   Eng        Refer:   Рус   |   Eng  

Команда: xed
sfk ... +xed /from/to/ [/from2/to2/]

   a stream text editor using SFK Simple Expressions.

   - takes text stream input from a previous command,
     or a binary stream from sfk extract.
   - joins all lines into one large block that can be
     changed in complete.
   - splits output again into lines for further use,
     or passes output as binary to another +xed.
   - may also read and write a single file.

   xed/xex is designed to post process small to medium sized
   data streams or files. it is not suitable to edit large files
   beyond 100 MB, as the whole content must fit into memory
   multiple times. use "sfk xreplace" to process large files.

   wildcards and SFK expressions
      SFK Expressions are simple patterns containing literal text,
      wildcards * and ? and character classes in square brackets [].
      basically, the syntax provides extended wilcards but no
      further logic and is not related to regular expressions.

      search patterns are surrounded by a separator character which
      can be anything not contained in the search text, like / or _

      within a pattern /fromtext/totext/ the fromtext may contain:

        *                       - 0 to 4000 characters in the same
                                  text line or paragraph, i.e. all
                                  bytes not being CR, LF or NULL.
                                  4000 is just a default maximum
                                  that can be changed by:
        [0.100000 chars]        - 0 to 100000 characters in the same
                                  text line or paragraph, i.e. the
                                  same as * but with a larger range.
        ?                       - one character.
        ?????                   - same as [5.5 chars] or [5 chars]
        [bytes]                 - 0 to 4000 bytes (with CR,LF,NULL)
                                  i.e. it collects stream text
                                  across lines, even in binary data
        **                      - the same as [bytes].
        [0.100 bytes]           - 0 to 100 bytes
        [.100000 bytes]         - up to 100000 bytes
        [1.* bytes]             - 1 to default maximum bytes
        [2 chars]               - exactly 2 chars
        [30 bytes]              - exactly 30 bytes
        [byte of aeiou]         - one vocal (a OR A OR e OR ...),
                                  case insensitive by default.
                                  "aeiou" is a character list.
        [byte of \\\x2f]        - a backslash \ or forw. slash /
        [bytes of \r\n \t]      - whitespace incl. line ends
        [bytes of (\r\n \t)]    - the same, () are optional
        [bytes not \r\n\0]      - up to 4000 bytes as long as no
                                  CR, LF or NULL byte appears
        [chars]                 - the same as [bytes not \r\n\0],
                                  i.e. collect text in a line
        [char not ( \t)]        - same as [byte not ( \r\n\0\t)],
                                  everything not blanks and tabs
        [char not )( \t]        - not brackets, blanks and tabs,
                                  same as not (\(\) \t)
        [chars of a-z0-9]       - means a-zA-Z0-9 as search is
                                  case insensitive by default
        [chars of \x61-\x7A]    - search a-z but not A-Z, or use
                                  option -case for case search
        [eol]                   - end of line by characters:
                                  CRLF or LF or CR

        [white]     = chars of (\t )     - 0 or more whitespaces
        [xwhite]    = bytes of (\t \r\n) - same but across lines
        [1 white]   = byte  of (\t )     - 1 whitespace
        [digit]     = byte  of (0-9)     - 1 digit
        [digits]    = bytes of (0-9)     - 0 or more digits
        [hexdigit]  = byte  of (0-9a-f)  - 1 hexadecimal digit
        [hexdigits]  = bytes of (0-9a-f) - 0 or more hex digits

        special keywords that do not count as tokens:
        [skip]   - at the start of a pattern: skip such text
                   completely, do not count it as a search hit.
        [keep]   - search also the following text but keep it
                   in the input data, without consuming it.
        [ortext] - foo[ortext]bar searches word foo or bar.
                   [ortext] is allowed only between literals.

        anchors that have no length of their own:
        [start]  - start of file
        [end]    - end of file
        [lstart] - line start, i.e. start or CRLF or CR or LF
        [lend]   - logical line end, i.e. eol or end of file.
                   to replace line ends use [eol] instead.

        how to search or replace special characters:
        -  to search or replace text containing the literal characters
           * ? \ [ ] then these must be escaped like \* \? \\ \[ \]
        -  ( ) are escaped only within character lists, like \( \)
        -  to search or replace the forward slash '/' type \x2f or use
           another char around from/to text, e.g. _fromtext_totext_
        -  parameters with blanks and non trivial characters need double
           quotes "", see also "about Shell Command Characters" below.

        expansion priorities: (highest first)
        if two search parts are side by side, and the same input
        character matches both, then these priorities apply:

          5:  start, end, lstart, lend
          4:  literal text, eol
          3:  whitelist classes: byte of, bytes of
          2:  blacklist classes: chars not, bytes not
          1:  plain wildcards: ?, *, **, byte, bytes, chars

        this means in "/[bytes]foo/" the [bytes] will stop to collect
        characters as soon as "foo" is found, as "foo" is a literal.
        on same or higher priority the right side stops the left side.

      the totext may contain:

        [part 1]            use first text part of the fromtext.
                            e.g. the fromtext /*foo[.100 chars]bar*/
                            contains parts :   1 2         3    4 5
        [part1]             the same (blank is optional).
        [parts 1,2,3]       use parts 1, 2 and 3.
        [parts 1-10]        use parts 1 to 10.
        [strip(part1,\0)]   use part 1 but remove zero bytes.
                            only zero bytes "\0" can be removed.
        [file.name]         full input filename with path
        [file.relname]      input filename without path
        [file.path]         input file's path
        [file.base]         relname without last .extension
        [file.ext]          input filename extension
        [all]               use all parts from fromtext.

        [setvar name]...[endvar]   set variable "name" with data
                                   between setvar and endvar.
        [getvar name]              fill in data from variable "name"

        although anchors like lstart, lend count as a separate part
        they need NOT be specified in the totext. this means that
        /[lstart]foo[lend]/bar/ just changes the word "foo".

   supported slash patterns
      \t    = TAB
      \r    = CR
      \n    = LF
      \x00  = one byte with code 00 hexadecimal
      \0    = short form for \x00
      \q    = a double quote "
      \\    = the backslash character \ itself
      \[    = the bracket open character [
      \]    = the bracket close character ]
      \*    = the literal star character *
      \?    = the literal question mark  ?
      \-    = to use literal "-" in a command
      Within multi line -bylist files:
      \     = slash+blank is changed to a single blank
      Only within "char of" or "byte not" lists:
      \(    = to use literal character "("
      \)    = to use literal character ")"

   SFK expression options
      -showpart(s)  print /from/ part numbers, range statistics
                    and expansion priority points per part.
                    done automatically if a required /to/ text
                    is not given with a command.
      -showbest     if a /from/ pattern finds nothing, use this to
                    see how many parts would match so far, and with
                    up to how many bytes per part. anchors like [lstart]
                    may show a non zero length when matching (CR)LF.
      -showlist     with -bylist, show the internal joined list if
                    commands are spread across multiple lines.
      -showall      show all of the above.
      -xmaxlen=n    set default maximum length for chars or bytes commands,
                    e.g. -xmaxlen=10000 means /foo*bar/ matches with up to
                    10000 characters between foo and bar. the default max
                    length without this option is 4000 characters.

   performance notes
    - always use a string literal, or single byte or char, at the start
      of your search expressions, like in /foo*bar/ starting with 'f'.
      Do not use a wildcard like * at the start like in /*foobar/
      when searching huge input data, as your search will slow down by
      factor 256. Use /[lstart]*foobar/ instead.
    - the system may cache output file(s), writing to disk in background
      after sfk has finished. subsequent batch commands may execute slower.

   options
      -case        compare case sensitive, default is nocase.
                   for further options see: sfk help nocase
      -bylist x    read /from/to/ patterns from a file x,
                   supporting multiple lines per pattern.
                   for details type: sfk rep -full
      -bylinelist x  read /from/to/ or just /from/ patterns
                   from a file with one pattern per line.
                   best for searching many phrases with
                   simple or no output reformatting.
      -i           process text stream from standard input
      -tolines     force output as text lines. use this
                   if you get unexpected hex data.
      -nomark      do not highlight changes in output
      -nocol       no colors at all to allow more memory
      -write       if input filename is given, rewrite file
                   with the changed data.
      -tofile f    write output to file f. do not use +tofile
                   chaining as it splits data into text lines.
      -rawterm     on output to terminal do not strip codes
                   below 32. Null bytes are always stripped.
      -dump[raw]   create hex dump [raw = w/o eol highlight]
      -crlf, -lf   for file headers and default totext: force
                   crlf or lf line endings instead of default
      -justrc      print no output, just set return code.
      -firsthit    use only first matching result.

   chaining I/O support
      extract ... +xed   supports binary data transfer.
      xed ... +xed       supports binary data transfer.
      In all other cases like xed ... +filter data is passed
      as text lines without zero bytes and up to 4000 chars
      per line. Binary transfer needs four times free memory
      available then the actual number of bytes passed.

   unexpected hex data with xed chaining
      if you use xed and get an unexpected hex output
      like 746573746... it means a following command
      cannot handle stream data. use option -tolines then.

   unexpected line breaks with +tofile
      happen if lines are longer then 4096 chars.
         use -tofile instead.
      happen if data contains carriage return chars.
         add "/\r//" to remove them.

   see also
      sfk swap     change single line character order

   web access support
      extracting the head section from a web page can be done like:
      sfk xex http://192.168.1.100/ "_**_"
      sfk xex http://.100/ "_**_"
      sfk web .100 +xex "_**_"

   archive file reading
      xed may directly read archive file entries like
      src.zip\\sub1.bz2\\sub2.tar.gz. for details and
      limitations type "sfk help xe".

   beware of Shell Command Characters.
      to find or replace text patterns containing spaces or special
      characters like <>|!&?* you must add quotes "" around parameters
      or the shell environment will destroy your command. for example,
      pattern /foo bar/other/ must be written like "/foo bar/other/"
      within a .bat or .cmd file the percent % must be escaped like %%
      even within quotes: sfk echo -spat "percent %% is a percent \x25"

   unexpected repeat replace behaviour
      depending on the input data and search/replace expressions,
      it can happen that running the same replace multiple times
      on the same stream produces further hits that didn't exist
      in the first run. read the sfk replace extended help text
      by "sfk replace -full" for details.

   quoted multi line parameters are supported in scripts
      using full trim. type "sfk script" for details.

   return codes for batch files
      0 = no matches, 1 = matches found, >1 = major error occurred.
      see also "sfk help opt" on how to influence error processing.

   about example numbers with [brackets]
      if you see [1] type "sfk cmd 1" for whole command in one line.

   web reference
      http://stahlworks.com/sfk-xed

   more in the SFK Book
      the SFK Book contains a 60 page tutorial, including
      detailed xed examples with input, script and output.
      type "sfk book" for details.

   examples
      Note: also see "sfk xex" for further examples.
      sfk xed in.txt "/foo*bar/goo/" -tofile out.txt
         read from file in.txt, replace "foo" and "bar" with
         up to 4000 characters inbetween, in the same line,
         by the word "goo". write output to a file out.txt.
      sfk xed in.txt "/foo*bar/goo/" -write
         same as above, but replace within file in.txt
      sfk xed in.html "///" -tofile out.html
         remove all remark blocks starting with "", across any number of lines,
         with up to 4000 bytes, from the HTML code.
      sfk xex in.zip\\sub1.tar.bz2\\sub2.tar.gz\\Trace.hpp "/class*/"
         XE: extract phrases starting with "class" from a
             .tar.gz within a .tar.bz2 within a .zip file.
         XD: demo reads first 1000 bytes from sub2.tar.gz
      sfk xed in.txt /foo12/foo34/ /foo34/foo12/ -tofile out.txt
         swaps foo12 and foo34. with xed, replaced text is not
         replaced again by further patterns in the same command.
      sfk xed in.dat -dump
       "/\x66\x6f\x6f[0.100 bytes]\x62\x61\x72/---/"
         replace binary data starting with bytes 0x66, 0x6f, 0x6f,
         ending with 0x62, 0x61, 0x72 and up to 100 bytes inbetween
         by "---" and show a hex dump of the output data. [5]
         add -tofile out.dat to write the output data to a file.
      sfk xed in.csv "/*\t*\t*Genway Rd*/[parts 1,2,5,6,7,2,3]/"
         a tab separated CSV file with name, road, city like
           Bemond Furn. Ltd    147 Elney Rd      Hertford NY 83058
           Candale Design Ltd  Seattle KS 51028  868 Genway Rd
           Betree Furn. Ltd    311 Napton Rd     Portland NC 97702
         contains wrong records with "Genway Rd" in the 3rd column.
         fix only these records by swapping column 2 and 3.
         part 2 is just a tab character, used twice in output.
      sfk xed in.txt "/\r//" +xed "_[lstart]\* [bytes]
       [keep]\n\* [ortext]\n\n_
  • [part3]
  • _" change a plain text enumeration like [6] * first item is a double line text * second * third followed by an empty line to HTML code like
  • first item is a double line text
  • second
  • third
  • things to consider: - each enum paragraph ends at another line starting with * or at an empty line \n\n - windows text files use \r\n line endings, so to allow a convenient \n\n search for empty lines use /\r// first. this must be done as a separate +xed, otherwise the edited line ends are skipped in further searches. - we cannot search the line ends by [eol] as [ortext] requires pure literals like \n. - the [keep] tells to search until \n* but not to consume this, i.e. further searches can re-find and replace it as [lstart]* sfk version -own +filter -stabform "$col5" +setvar ver +then xed info.xml "=** =[part1][getvar ver][part3]=" get the version number from sfk, store it in an sfk variable "var" and fill this into info.xml by changing the text within the program_version tag. because both / and _ chars are used in the xml data we use another delimiter character "=". [8] sfk xed in.txt "/[eol]/, /" +xed "/[60 chars]*, /[all]\n/" if in.txt contains only one short word per line reformat this as a comma separated text using at least 60 characters per line. sfk xed in.txt "/*[eol]/\q[part1]\q, /" +xed "/[60 chars]*, /[all]\n/" same as above, but surrounding words by quotes. sfk xex foo.h +setvar a +then xed bar.c "/[lstart]#include \qfoo.h\q*[eol]/[getvar a]/" replace a text line: #include "foo.h" within file bar.c by the file content of foo.h sfk echo aabbccdd +xed "/[2 chars][2 chars] [2 chars][2 chars]/[parts 4,3,2,1]/" produces ddccbbaa, i.e. it swaps 4 blocks of 2 chars each. (little endian conversion)