20 авг. 2012 г.

Bash: Let me bash that for you, Part 1

Let's get the next dan of bash with the text processing.

The 1st note is the greatest idea of the whole unix is in that everything is file - classic file, directory, devices, file systems and so on.

Many commands expects the file for its input or it could be a simple standard input also known as stdin (refers as stream #0).

As well many commands put their output to standard output aka stdout (refers as stream #1) or to standard error output aka stderr (refers as stream #2).

There are several key text processing command and several techniques like stream redirection.

And once more! the most powerful command is man means manual. If you'd like to know more about some command - just type, e.g.
$ man echo
ECHO(1)    User Commands         ECHO(1)

NAME


       echo - display a line of text


echo message

as just mentioned - display a line of text
$ echo Hello
Hello
$ echo Hello world
Hello world
$ echo "Hello world"
Hello world
Note: echo adds newline character (\n) and the end of line - therefore "hello" costs 5 byte + 1 byte which is newline character.
Hint: echo -n doesn't add the trailing newline
$ echo -n "Hello world"
Hello world $ 

wc [option] [file]

word counter - prints newline, word, and byte counts of file
$ wc myfile
 2  2 12 myfile
Hint:option -c counts bytes
$ wc -c myfile
12 myfile
Hint:option -l (line) counts newline
$ wc -l myfile
2 myfile

cat   file

prints out file content
$ cat myfile
Hello
world
Hint: there are gzcat for Solaris and zcat for GNU/Linux to print over gzipped file. Moreover, gzcat/zcat is able to handle as well gzipped files as well the ungzipped ones.

grep [option] [pattern] file

filters lines that case-sensitive matches the pattern
$ grep He myfile
Hello
$ grep z myfile
# nothing
Hint:option -i filters case-insensitive matching lines
$ grep he myfile
# nothing
$ grep -i he myfile
Hello
Hint:option -v selects non-matching lines
$ grep -v He myfile
world
Hint:option -c counts number of matched lines
$ grep -c He myfile
1
$ grep -c he myfile
0
$ grep -ci he myfile
1
$ grep -cv z myfile
2
Hint: there are gzgrep for Solaris and zgrep for GNU/Linux to grep over gzipped file. Moreover, gzgrep/zgrep is able to handle as well gzipped files as well the ungzipped ones.

stream redirection

  • >
    override the file and appends to it (creates the file if not present).
    $ echo hello > file_
    # clear file "file_" and put "hello" into it.
    $ ls -l file_
    # check it size - "hello" is 5 bytes + "new line" symbol
    -rw-r--r-- 1 user user 6 Jul  7 10:38 file_
    $ echo q > file_
    $ ls -l file_ 
    -rw-r--r-- 1 user user 2 Jul  7 10:38 file_

  • >>
    appends only to the end of file (creates the file if not present).
    $ echo Hello >> myfile
    $ ls -l myfile
    -rw-r--r-- 1 user user 6 Jul  7 10:38 myfile
    $ echo world >> myfile
    $ ls -l myfile
    -rw-r--r-- 1 user user 12 Jul  7 10:38 myfile

  • Note 1: by default it redirects to file from stdout which refers as stream 1, in the meantime stderr refers as stream 2.
    $ ls -l myfile 1> list.txt
    # redirects stdout into list.txt file
    $ cat list.txt
    -rw-r--r-- 1 user user 12 Jul  7 10:38 myfile
    $ ls -l smthfile 1> list.err
    # there is no file smthfile - error should happen
    # redirects stdout into list.err file, stderr will be printed
    ls: cannot access smthfile: No such file or directory
    $ cat list.err 
    # list.err is empty
    $ ls -l smthfile 2> list.err
    # redirects stdout into list.err file
    $ cat list.err 
    ls: cannot access smthfile: No such file or directory
    

  • Note 2: It's possible to redirect one stream into another:
    $ ls -l smthfile 1> list.err 2>&1
    # redirects stdout into list.err file, after stderr redirects into stdout
    $ cat list.err 
    ls: cannot access smthfile: No such file or directory
    

  • Note 3: There is a special file /dev/null is like a black hole - discards all data written to it.

  • Note 4: I refer to this content of myfile further.

    command1 | command2

    output of command1 is the input for command2 or piping data through commands
    $ echo "Hello" | grep o
    Hello
    $ cat myfile | grep o
    Hello
    world
    $ cat myfile | grep o | grep w
    world
    It's possible to use sequence of pipes - the output of one command is the input for the following.
    Note: keep common sentence in the piping. There are several bad and good practices:
    • cat & grep
      to filter lines matches the patten
      Bad
      $ cat myfile | grep e
      Hello
      Good
      $ grep e myfile
      Hello
    • cat & grep & wc
      to calculate a number of lines which match the pattern
      Very Bad
      $ cat myfule | grep o | wc -l
      2
      Bad
      $ grep o myfile | wc -l
      2
      Good
      $ grep -c o myfile 
      2

    tr [from-symbol-set] [to-symbol-set]

    translate characters
    Example Shift to one letter left
    $ echo 12ABC | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 BCDEFGHIJKLMNOPQRSTUVWXYZ@123456789_
    23BCD
    Hint: tr knows several predefined symbol classes:
    • [:alpha:] - alphabetic characters
    • [:digit:] - numeric characters
    • [:alnum:] - alphanumeric (alphabetic + numbers) characters>
    • [:lower:] - lower-case alphabetic characters
    • [:upper:] - upper-case alphabetic characters
    • and others...
    $ echo xABCDEF12 | tr "[:upper:]" "[:lower:]"
    xabcdef12

    sed [options] file

    The stream editor
    • s for replacement by pattern
      The rule is
      s separator pattern separator replacement separator options
      $ echo "Hello world" | sed "s/o/z/"
      Hellz world
      
      Note: pattern is regular expression. Be careful, read more about sed's regexp dialect.
      $ echo "Hello world" | sed "s/./z/g"
      # . (dot) is special symbol in terms of regular expression
      # means any character
      zzzzzzzzzzz
      
      Hint #1: use option g to replace globally
      $ echo "Hello world" | sed "s/o/z/g"
      Hellz wzrld
      
      Hint #2: use option i to ignore case
      $ echo "HellO world" | sed "s/o/z/gi"
      Hellz wzrld
      
      Hint #3: it doesn't necessary to use separator symbol / (slash) - use any suitable for you to match the rule
      $ echo "HellO world" | sed "s,o,z,gi"
      Hellz wzrld
      
    • chain of seds
      Instead of sed chain (through piping) it's much better to use ; (semicolon)
      # it's not good:
      $ echo "Hello world" | sed "s,o,z,g" | sed "s,z,o,g"
      Hello world
      # the best way:
      $ echo "Hello world" | sed "s,o,z,g ; s,z,o,g"
      Hello world
    • y by-char replacement, similar to tr
      $ echo 12ABC | sed "y,ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,BCDEFGHIJKLMNOPQRSTUVWXYZ@123456789_,"
      23BCD
    • multilines'n'file operations
      Suppose we have file somefile with these lines:
      foo
      1234
      1234-foo-bar
      bar
      foo-bar
      $ sed "/foo/d" somefile 
      # delete all lines that matches the pattern
      1234
      bar
      $ sed "s,foo,bar," somefile 
      bar
      1234
      1234-bar-bar
      bar
      bar-bar
      $ sed '3s,foo,bar,' somefile 
      # change only 3rd line of somefile
      foo
      1234
      1234-bar-bar
      bar
      foo-bar
      $ sed '1,3s,foo,bar,' somefile 
      # change from 1st up to 3rd lines of somefile
      bar
      1234
      1234-bar-bar
      bar
      foo-bar
      $ sed '3!s,foo,bar,' somefile 
      # change all but 3rd line of somefile
      bar
      1234
      1234-foo-bar
      bar
      bar-bar
      
      and the some dangerous, but useful - in file replacement
      $ sed -i.orig '3!s,foo,bar,' somefile 
      # creates backup file somefile.orig
      # replace foo at all lines but the 3rd
      $ cat somefile
      bar
      1234
      1234-foo-bar
      bar
      bar-bar
      

    diff [options] from-file to-file

    difference between two files
    $ diff -u somefile.orig somefile
    #  the unified output format:
    # - (minus) means what're lines have been removed
    # + (plus) means what're lines have been added
    --- somefile.orig 2012-08-19 12:56:58.000000000 +0400
    +++ somefile 2012-08-19 13:19:21.000000000 +0400
    @@ -1,5 +1,5 @@
    -foo
    +bar
     1234
     1234-foo-bar
     bar
    -foo-bar
    +bar-bar

    awk script

    awk is rich scripting language itself that's present on nearly ever unix-like machine.
    That's why look at the most frequently used cases:
    $ echo "This is world" | awk '{print $3;}'
    # prints 3rd column
    # $N refers to Nth column (of stream or from file)
    world
    $ echo "This,is,world" | awk '{split($1,X,",");print X[3]" "X[2];}'
    # splits 1st column using , (comma) as separator
    # result is X array
    # prints 3rd and 2nd items of X array
    world is
    $ awk '{c++}END{print c;}' somefile
    # increase counter c on each line by one
    # at the end prints out its value = number of lines in file
    5

    bc

    An arbitrary precision calculator language.

    The powerful tool with amazing set of math functionality: power calculation, exponent, sine, cosine, arctangent, natural logarithm, squared root, transform from one numeral system into another (e.g from decimal into binary, hexadecimal and vice-verse).

    It's too rich, that's why i'd like to describe it briefly:
    $ echo "2*3/7" | bc -l
    .85714285714285714285
    $ echo "10^6/4" | bc -l
    # quarter of one million
    250000.00000000000000000000
    $ echo "l(10)" | bc -l
    # natural logarithm of 10
    2.30258509299404568401
    # keep in mind relation  
    # calculate 
    $ echo "l(10)/l(2)" | bc -l
    3.32192809488736234789
    
    # converting to/from different numerical systems
    
    $ echo "obase=16;254" | bc -l
    # convert 254 to hexadecimal (output base is 16)
    FE
    $ echo "ibase=2;101" | bc -l
    # convert from binary (input base is 2) to decimal (default)
    5
    
  • 1 комментарий:

    Dmitry Baranov комментирует...

    Regarding streams redirection - it is possible to redirect input the same way, i.e.:
    while read line; do echo "text: $line"; done < file.txt

    Regarding piping - a lot of tools allow to pass STDIN as file argument using "-" option. I.e.:
    # Will not work, diff does not accept data for comparison on STDIN
    grep INFO info1.log | diff info2.log

    #Will work perfectly
    grep INFO info1.log | diff - info2.log