Tuesday, October 29, 2013

How to apply different sorting methods to different columns in a CSV file in BASH?

I have a CSV file like this:

fish,4cat,1elephant,1tree,2dog,8car,10

awk -F',' '{print length($1),$0}' file.csv | sort -k1nr | cut -d' ' -f 2- will sort the file by word length, for all words appearing in the first column:

elephant,1fish,4tree,2cat,1dog,8car,10

sort -t, -k+2 -n -r file.csv will sort the file from greatest to least according to the number appearing in the second column:

car,10dog,8fish,4tree,2elephant,1cat,1

How can I use these two commands together such that the CSV file is first sorted by word length, according to the words appearing in the first column, then any rows containing words of equal length within the first column are sorted according to the number appearing in the second column from greatest to least. The resulting output would look like this:

elephant,1fish,4tree,2car,10dog,8cat,1

How can these two sorting methods be used together?

If you are using then you can use the asort function to perform sort, so no other utility has to be called. You can try something like this:

awk -F, 'function cmp(i1,v1,i2,v2) {split(v1,a1); split(v2,a2)  l1=length(a1[1]); l2=length(a2[1])  return l1 > l2 ? -1 : l1 < l2 ? 1 : a1[2] > a2[2] ? -1 : a1[2] < a2[2]}{a[n++]=$0}END{asort(a,a,"cmp"); for(i in a) print a[i]}' infile

Output:

elephant,1fish,4tree,2car,10dog,8cat,1

This script reads all the lines first then it sorts the array called a with the function cmp. The only trick I used that a > b returns the usual 1 or 0 for true or false.

A little bit shorter version in :

perl -F, -ane 'push @a,[@F];   END{for $i(sort {length $b->[0]<=>length $a->[0] or $b->[1]<=>$a->[1]} @a) {printf "%s,%d\n", @$i}}' infile

This is not 100% correct as $F[1] contains the \n, yet printf handles it properly.

No comments:

Post a Comment