Often I am working with and manipulating text files on the terminal so I’m constantly doinking around with command line tools and chaining them to get what I need. In such a case I needed to massage and sum the values from a specific column in a CSV file to get a total number of terabytes.
The CSV file was like so and had hundreds of lines:
Name,Data1,Data2
EntryA,1.00 TB,0.50 GB
EntryB,3.02 TB,23.77 GB,
EntryC,2.73 TB,0.00 GB
EntryD,5.50 TB,79.22 GB
EntryE,2.25 TB,0.84 GB
...and so on for hundreds of lines...
I needed to skip the header row and sum the numbers just in the second column (i.e. “3.02 TB“). This command sequence gets the job done:
awk -F "," '{print $2}' data.csv | awk -F " " '{print $1}' | tail -n +2 | awk '{s+=$1}END{print s}'
awk -F "," '{print $2}' data.csv
use a “,” as the field separator and pipe the second field which is the only field I need to
awk -F " " '{print $1}'
use whitespace as field separator and pipe only the first field which is the number portion of the field so now I have just have the number I need to tail -n +2
tail the data from the CSV file starting from line 2 to ensure that header row is gone and does not muck things up for me and pipe that to awk '{s+=$1}END{print s}'
which sets variable s
to the first field (there is only one field now which is a number) and sum it on itself with each successive line and then print s
when done which will be the summed total.