Sum numbers in a CSV file via bash command line with awk

Tags:

Often I am working with and manipulating text files on the terminal so I’m constantly doinking around with command line tools and chaining them to get what I need. In such a case I needed to massage and sum the values from a specific column in a CSV file to get a total number of terabytes.

The CSV file was like so and had hundreds of lines:

Name,Data1,Data2
EntryA,1.00 TB,0.50 GB
EntryB,3.02 TB,23.77 GB,
EntryC,2.73 TB,0.00 GB
EntryD,5.50 TB,79.22 GB
EntryE,2.25 TB,0.84 GB
...and so on for hundreds of lines...

I needed to skip the header row and sum the numbers just in the second column (i.e. “3.02 TB“). This command sequence gets the job done:

awk -F "," '{print $2}' data.csv | awk -F " " '{print $1}' | tail -n +2 | awk '{s+=$1}END{print s}'

awk -F "," '{print $2}' data.csv use a “,” as the field separator and pipe the second field which is the only field I need to
awk -F " " '{print $1}' use whitespace as field separator and pipe only the first field which is the number portion of the field so now I have just have the number I need to
tail -n +2 tail the data from the CSV file starting from line 2 to ensure that header row is gone and does not muck things up for me and pipe that to
awk '{s+=$1}END{print s}' which sets variable s to the first field (there is only one field now which is a number) and sum it on itself with each successive line and then print s when done which will be the summed total.

Leave a Reply

Your email address will not be published. Required fields are marked *