Крылья, ноги... Хвост!: Bash: Let me bash that for you, Part 1

Let's get the next dan of bash with the text processing.

The 1st note is the greatest idea of the whole unix is in that everything is file - classic file, directory, devices, file systems and so on.

Many commands expects the file for its input or it could be a simple standard input also known as stdin (refers as stream #0).

As well many commands put their output to standard output aka stdout (refers as stream #1) or to standard error output aka stderr (refers as stream #2).

There are several key text processing command and several techniques like stream redirection.

And once more! the most powerful command is man means manual. If you'd like to know more about some command - just type, e.g.

$ man echo
ECHO(1)    User Commands         ECHO(1)

NAME


       echo - display a line of text

echo message

as just mentioned - display a line of text

$ echo Hello
Hello
$ echo Hello world
Hello world
$ echo "Hello world"
Hello world

Note: echo adds newline character (\n) and the end of line - therefore "hello" costs 5 byte + 1 byte which is newline character.
Hint: echo -n doesn't add the trailing newline

$ echo -n "Hello world"
Hello world $

wc [option] [file]

word counter - prints newline, word, and byte counts of file

$ wc myfile
 2  2 12 myfile

Hint:option -c counts bytes

$ wc -c myfile
12 myfile

Hint:option -l (line) counts newline

$ wc -l myfile
2 myfile

cat file

prints out file content

$ cat myfile
Hello
world

Hint: there are gzcat for Solaris and zcat for GNU/Linux to print over gzipped file. Moreover, gzcat/zcat is able to handle as well gzipped files as well the ungzipped ones.

grep [option] [pattern] file

filters lines that case-sensitive matches the pattern

$ grep He myfile
Hello
$ grep z myfile
# nothing

Hint:option -i filters case-insensitive matching lines

$ grep he myfile
# nothing
$ grep -i he myfile
Hello

Hint:option -v selects non-matching lines

$ grep -v He myfile
world

Hint:option -c counts number of matched lines

$ grep -c He myfile
1
$ grep -c he myfile
0
$ grep -ci he myfile
1
$ grep -cv z myfile
2

Hint: there are gzgrep for Solaris and zgrep for GNU/Linux to grep over gzipped file. Moreover, gzgrep/zgrep is able to handle as well gzipped files as well the ungzipped ones.

stream redirection

>
override the file and appends to it (creates the file if not present).

$ echo hello > file_
# clear file "file_" and put "hello" into it.
$ ls -l file_
# check it size - "hello" is 5 bytes + "new line" symbol
-rw-r--r-- 1 user user 6 Jul  7 10:38 file_
$ echo q > file_
$ ls -l file_ 
-rw-r--r-- 1 user user 2 Jul  7 10:38 file_

>>
appends only to the end of file (creates the file if not present).

$ echo Hello >> myfile
$ ls -l myfile
-rw-r--r-- 1 user user 6 Jul  7 10:38 myfile
$ echo world >> myfile
$ ls -l myfile
-rw-r--r-- 1 user user 12 Jul  7 10:38 myfile

Note 1: by default it redirects to file from stdout which refers as stream 1, in the meantime stderr refers as stream 2.

$ ls -l myfile 1> list.txt
# redirects stdout into list.txt file
$ cat list.txt
-rw-r--r-- 1 user user 12 Jul  7 10:38 myfile
$ ls -l smthfile 1> list.err
# there is no file smthfile - error should happen
# redirects stdout into list.err file, stderr will be printed
ls: cannot access smthfile: No such file or directory
$ cat list.err 
# list.err is empty
$ ls -l smthfile 2> list.err
# redirects stdout into list.err file
$ cat list.err 
ls: cannot access smthfile: No such file or directory

Note 2: It's possible to redirect one stream into another:

$ ls -l smthfile 1> list.err 2>&1
# redirects stdout into list.err file, after stderr redirects into stdout
$ cat list.err 
ls: cannot access smthfile: No such file or directory

Note 3: There is a special file /dev/null is like a black hole - discards all data written to it.

Note 4: I refer to this content of myfile further.

command1 | command2

output of command1 is the input for command2 or piping data through commands

$ echo "Hello" | grep o
Hello
$ cat myfile | grep o
Hello
world
$ cat myfile | grep o | grep w
world

It's possible to use sequence of pipes - the output of one command is the input for the following.
Note: keep common sentence in the piping. There are several bad and good practices:

cat & grep
to filter lines matches the patten
Bad
```
$ cat myfile | grep e
Hello
```
Good
```
$ grep e myfile
Hello
```
cat & grep & wc
to calculate a number of lines which match the pattern
Very Bad
```
$ cat myfule | grep o | wc -l
2
```
Bad
```
$ grep o myfile | wc -l
2
```
Good
```
$ grep -c o myfile 
2
```

tr [from-symbol-set] [to-symbol-set]

translate characters
Example Shift to one letter left

$ echo 12ABC | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 BCDEFGHIJKLMNOPQRSTUVWXYZ@123456789_
23BCD

Hint: tr knows several predefined symbol classes:

[:alpha:] - alphabetic characters
[:digit:] - numeric characters
[:alnum:] - alphanumeric (alphabetic + numbers) characters>
[:lower:] - lower-case alphabetic characters
[:upper:] - upper-case alphabetic characters

and others...

$ echo xABCDEF12 | tr "[:upper:]" "[:lower:]"
xabcdef12

sed [options] file

The stream editor

s for replacement by pattern
The rule is
s separator pattern separator replacement separator options
```
$ echo "Hello world" | sed "s/o/z/"
Hellz world
```
Note: pattern is regular expression. Be careful, read more about sed's regexp dialect.
```
$ echo "Hello world" | sed "s/./z/g"
# . (dot) is special symbol in terms of regular expression
# means any character
zzzzzzzzzzz
```
Hint #1: use option g to replace globally
```
$ echo "Hello world" | sed "s/o/z/g"
Hellz wzrld
```
Hint #2: use option i to ignore case
```
$ echo "HellO world" | sed "s/o/z/gi"
Hellz wzrld
```
Hint #3: it doesn't necessary to use separator symbol / (slash) - use any suitable for you to match the rule
```
$ echo "HellO world" | sed "s,o,z,gi"
Hellz wzrld
```

chain of seds
Instead of sed chain (through piping) it's much better to use ; (semicolon)

# it's not good:
$ echo "Hello world" | sed "s,o,z,g" | sed "s,z,o,g"
Hello world
# the best way:
$ echo "Hello world" | sed "s,o,z,g ; s,z,o,g"
Hello world

y by-char replacement, similar to tr

$ echo 12ABC | sed "y,ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,BCDEFGHIJKLMNOPQRSTUVWXYZ@123456789_,"
23BCD

multilines'n'file operations
Suppose we have file somefile with these lines:

foo
1234
1234-foo-bar
bar
foo-bar

$ sed "/foo/d" somefile 
# delete all lines that matches the pattern
1234
bar
$ sed "s,foo,bar," somefile 
bar
1234
1234-bar-bar
bar
bar-bar
$ sed '3s,foo,bar,' somefile 
# change only 3rd line of somefile
foo
1234
1234-bar-bar
bar
foo-bar
$ sed '1,3s,foo,bar,' somefile 
# change from 1st up to 3rd lines of somefile
bar
1234
1234-bar-bar
bar
foo-bar
$ sed '3!s,foo,bar,' somefile 
# change all but 3rd line of somefile
bar
1234
1234-foo-bar
bar
bar-bar

and the some dangerous, but useful - in file replacement

$ sed -i.orig '3!s,foo,bar,' somefile 
# creates backup file somefile.orig
# replace foo at all lines but the 3rd
$ cat somefile
bar
1234
1234-foo-bar
bar
bar-bar

diff [options] from-file to-file

difference between two files

$ diff -u somefile.orig somefile
#  the unified output format:
# - (minus) means what're lines have been removed
# + (plus) means what're lines have been added
--- somefile.orig 2012-08-19 12:56:58.000000000 +0400
+++ somefile 2012-08-19 13:19:21.000000000 +0400
@@ -1,5 +1,5 @@
-foo
+bar
 1234
 1234-foo-bar
 bar
-foo-bar
+bar-bar

awk script

awk is rich scripting language itself that's present on nearly ever unix-like machine.
That's why look at the most frequently used cases:

$ echo "This is world" | awk '{print $3;}'
# prints 3rd column
# $N refers to Nth column (of stream or from file)
world
$ echo "This,is,world" | awk '{split($1,X,",");print X[3]" "X[2];}'
# splits 1st column using , (comma) as separator
# result is X array
# prints 3rd and 2nd items of X array
world is
$ awk '{c++}END{print c;}' somefile
# increase counter c on each line by one
# at the end prints out its value = number of lines in file
5

bc

An arbitrary precision calculator language.

The powerful tool with amazing set of math functionality: power calculation, exponent, sine, cosine, arctangent, natural logarithm, squared root, transform from one numeral system into another (e.g from decimal into binary, hexadecimal and vice-verse).

It's too rich, that's why i'd like to describe it briefly:

$ echo "2*3/7" | bc -l
.85714285714285714285
$ echo "10^6/4" | bc -l
# quarter of one million
250000.00000000000000000000
$ echo "l(10)" | bc -l
# natural logarithm of 10
2.30258509299404568401
# keep in mind relation  
# calculate 
$ echo "l(10)/l(2)" | bc -l
3.32192809488736234789

# converting to/from different numerical systems

$ echo "obase=16;254" | bc -l
# convert 254 to hexadecimal (output base is 16)
FE
$ echo "ibase=2;101" | bc -l
# convert from binary (input base is 2) to decimal (default)
5

1 комментарий:

Unknown комментирует...: Regarding streams redirection - it is possible to redirect input the same way, i.e.:
while read line; do echo "text: $line"; done < file.txt

Regarding piping - a lot of tools allow to pass STDIN as file argument using "-" option. I.e.:
# Will not work, diff does not accept data for comparison on STDIN
grep INFO info1.log | diff info2.log

#Will work perfectly
grep INFO info1.log | diff - info2.log; 21 августа, 2012 20:00

Отправить комментарий

Крылья, ноги... Хвост!

Архив блога

Метки

Блоги, которые читаю

20 авг. 2012 г.

Bash: Let me bash that for you, Part 1

echo message

wc [option] [file]

cat file

grep [option] [pattern] file

stream redirection

command1 | command2

tr [from-symbol-set] [to-symbol-set]

sed [options] file

diff [options] from-file to-file

awk script

bc

1 комментарий:

Постоянные читатели

Крылья, ноги... Хвост!

Архив блога

Метки

Блоги, которые читаю

20 авг. 2012 г.

Bash: Let me bash that for you, Part 1

echo message

wc [option] [file]

cat file

grep [option] [pattern] file

stream redirection

command1 | command2

tr [from-symbol-set] [to-symbol-set]

sed [options] file

diff [options] from-file to-file

awk script

bc

1 комментарий:

Постоянные читатели

20 авг. 2012 г.