Код отслеживания Google Analytics.

Apr 2, 2010

One more note - sort, uniq and merging to the string

It is just one more note to myself. We have list of string - for example like this:
bg, cal, cat, cd, chmod, clear, cmp, cp, date, df, diff, du, echo, exit, fg, file, find, grep,
groups, gzip, head, history, id, info, jobs, kill, ln, locate, ls, man , mc, mkdir, more, mv, 
passwd, ps, pwd, rm, rmdir, scp, sed, set, sleep, slogin, sort, ssh, tail, tar, touch, uname,
vi, wc, which, whoami.

I need the same list without repeating of items and sorted.



echo "bg, cal, cat, cd, chmod, clear, cmp, cp, date, df, diff, du, echo, exit, fg, file, find, grep,
groups, gzip, head, history, id, info, jobs, kill, ln, locate, ls, man , mc, mkdir, more, mv,
passwd, ps, pwd, rm, rmdir, scp, sed, set, sleep, slogin, sort, ssh, tail, tar, touch, uname, vi,
wc, which, whoami, , " | sed "s| *, *|\n|g" | grep -v "^ *$" | sort --unique |
paste --delimiters=, --serial | sed "s|,|, |g"

Special attention should be paid to paste - I always forget about this command :)
Actually, the last part can be rewritten with sed only and without paste:
... | sort --unique | sed ':a;N;s/\n/, /;ta'
lets look at the sed program:
#define label named 'a'
:a
#append to pattern space next line
N
#replace new line symbol by ', '
s/\n/, /
#and if replace has been successfully done go to to the label 'a'
ta
#in case last string nothing will be replaced and script will be finished

if we want to optimize replace statement we can use more complicated sed script:
... | sort --unique | sed -e ':x;$by;N;bx' -e ':y;s/\n/, /g'
in details:
{
#declare label 'x'
    :x
#if it is last line then go to label 'y'
    $by
#append to pattern space next line
    N
#go to label 'x'
    bx
}
#declare label 'y'
:y
#replace all new line symbol by ', ' in one pass
s/\n/, /g

2 comments:

gavenkoa said...

I wrote about my solution at

http://brain-break.blogspot.com/2010/08/sort-with-uniq-comma-separated-items.html

http://brain-break-ru.blogspot.com/2010/08/sort.html

Beggy said...

Yes, thank you. It was interesting and I was surprised actually if someone has the same task among me.
You are absolutely right - sed is hard for reading and understanding. It was use here partially as naughtiness partially for remind me how it can be - that is the reason why sed scripts were detail documented above. By the way - the documentation of sed smaller than awk :)
As for me - both solution (sed and awk) are too "hard" and the "paste" approach probably should be use (first example)