I have a CSV file like this:
fish,4cat,1elephant,1tree,2dog,8car,10
awk -F',' '{print length($1),$0}' file.csv | sort -k1nr | cut -d' ' -f 2-
will sort the file by word length, for all words appearing in the first column:
elephant,1fish,4tree,2cat,1dog,8car,10
sort -t, -k+2 -n -r file.csv
will sort the file from greatest to least according to the number appearing in the second column:
car,10dog,8fish,4tree,2elephant,1cat,1
How can I use these two commands together such that the CSV file is first sorted by word length, according to the words appearing in the first column, then any rows containing words of equal length within the first column are sorted according to the number appearing in the second column from greatest to least. The resulting output would look like this:
elephant,1fish,4tree,2car,10dog,8cat,1
How can these two sorting methods be used together?
If you are using gawk then you can use the asort function to perform sort, so no other utility has to be called. You can try something like this:
awk -F, 'function cmp(i1,v1,i2,v2) {split(v1,a1); split(v2,a2) l1=length(a1[1]); l2=length(a2[1]) return l1 > l2 ? -1 : l1 < l2 ? 1 : a1[2] > a2[2] ? -1 : a1[2] < a2[2]}{a[n++]=$0}END{asort(a,a,"cmp"); for(i in a) print a[i]}' infile
Output:
elephant,1fish,4tree,2car,10dog,8cat,1
This script reads all the lines first then it sorts the array called a
with the function cmp
. The only trick I used that a > b
returns the usual 1 or 0 for true
or false
.
A little bit shorter version in perl:
perl -F, -ane 'push @a,[@F]; END{for $i(sort {length $b->[0]<=>length $a->[0] or $b->[1]<=>$a->[1]} @a) {printf "%s,%d\n", @$i}}' infile
This is not 100% correct as $F[1]
contains the \n
, yet printf
handles it properly.
No comments:
Post a Comment