Generating Average related Stats with Awk (BASH)

Awk can be pretty handy for quickly pulling out various statistics

This snippet details how to pull out

  • Mean (Average)
  • Median
  • 95th Percentile

For simplicity I've used $0 as the column number, see the examples for more details.

Details

  • Language: BASH

Snippet

# Calculate Average (Mean)
awk 'BEGIN{t=0}{t=t+$0}END{print t/NR}'

# 95th percentile - input should be pre-sorted. -0.5 here forces a round down
awk '{all[NR] = $0} END{print all[int(NR*0.95 - 0.5)]}'

# Median, also known as the 50th percentile. Input should be pre-sorted
awk '{all[NR] = $0} END{print all[int(NR*0.5 - 0.5)]}'

Usage Example

# Calculate average based on the 4th column in a tab seperate-file
cat file.csv | awk -F'\t' 'BEGIN{t=0}{t=t+$4}END{print t/NR}'

# same as above, but 95th percentile
cat file.csv | sort -n -t\t -k4 | awk '{all[NR] = $4} END{print all[int(NR*0.95 - 0.5)]}'

# Calculate the media, but assume it's comma-seperated this time and use column 2
cat file.csv | sort -n -t, -k2 | awk '{all[NR] = $2} END{print all[int(NR*0.5 - 0.5)]}'