Monday, April 25, 2011

Applying AWK on Set of Files with Same Extension

I want to apply the following "awk" command on files with extension "*.txt"

awk '$4 ~ /NM/{ sum += $2 } END{ print sum }'

But why this command doesn't work:

for i in *.txt do echo awk '$4 ~ /NM/{ sum += $2 } END{ print sum }' $i; done

Normally,

awk '$4 ~ /NM/{ sum += $2 } END{ print sum }' file1.txt

would work.

From stackoverflow
  • What is that "ech" after the do?

    neversaint : @unwind: echo - corrected. Thanks for pointing it.
    unwind : That was sort of a retorical question, but I guess I understated it. As other answers point out, you don't want to to echo the awk command, you want to run it, so remove that echo. :)
    neversaint : @unwind, the reason I use echo, because I also want to print which file being processed at the time and yield what result.
    unwind : @foolishbrat: But you can't do it like that ... If you say "echo this", the shell will just print out the word "this", not execute it, too.
  • Not sure if you've copy pasted or it's a typo.

    for i in *.txt do echo awk '$4 ~ /NM/{ sum += $2 } END{ print sum }' $i; done

    With echo corrected, the command above will echo your awk script and the filename, but not run it.

  • echo is not required.

    try

    for i in *.txt; do; awk '$4 ~ /NM/{ sum += $2 } END{ print sum }' $i; done

    or

    for i in *.txt; do awk '$4 ~ /NM/{ sum += $2 } END{ print sum }' $i; done

    should work

  • Once you've removed the echo it should work:

    for i in *.txt do awk '$4 ~ /NM/{ sum += $2 } END{ print sum }' $i; done
    

    It'll fail if there are any text files with spaces in them, you could try this:

    find . -name '*.txt' -print0 | xargs --null -n 1 awk '$4 ~ /NM/{ sum += $2 } END{ print sum }'
    

    An alternative for printing out names:

    find . -name '*.txt' -print -exec awk '$4 ~ /NM/{ sum += $2 } END{ print sum }' {} \;
    

    (Basically make find execute awk directly, so and also print out the file names.

    neversaint : Thanks so much. Is there a way I can print out the filename also?
  • for i in *.txt; do echo "$i"; awk '$4 ~ /NM/{ sum += $2 } END{ print sum }' "$i"; done
    

    This will print the names of the processed files together with the output of the awk command.

    Porges : "for i in *.txt" is bad style. Won't work with spaces.
    x-way : In Bash the 'for i in *.txt' works correctly also with spaces ($i contains the whole space-containing filename), but you are correct that this creates a problem when passing $i as an argument to awk. I've added the needed quotes around $i now, thanks.
  • Try this (use nawk or /usr/xpg4/bin/awk on Solaris):

    awk 'END {
      printf "%s: %.2f\n", fn, sum
      }
    FNR == 1 {
      if (fn) printf "%s: %.2f\n", fn, sum
      fn = FILENAME
      sum = 0
      }
    $4 ~ /NM/ { 
      sum += $2 
      }' *.txt
    
  • You need to add a ';' :

    for i in *.txt; do ...
    

    instead of

    for i in *.txt do ...
    

0 comments:

Post a Comment