Generating Average related Stats with Awk



Published: 2020-01-21 16:04:39 +0000
Categories: BASH,

Language

BASH

Description

Awk can be pretty handy for quickly pulling out various statistics

This snippet details how to pull out

  • Mean (Average)
  • Median
  • 95th Percentile
For simplicity I've used $0 as the column number, see the examples for more details.

Snippet

# Calculate Average (Mean)
awk 'BEGIN{t=0}{t=t+$0}END{print t/NR}'

# 95th percentile - input should be pre-sorted. -0.5 here forces a round down
awk '{all[NR] = $0} END{print all[int(NR*0.95 - 0.5)]}'

# Median, also known as the 50th percentile. Input should be pre-sorted
awk '{all[NR] = $0} END{print all[int(NR*0.5 - 0.5)]}'

Usage Example

# Calculate average based on the 4th column in a tab seperate-file
cat file.csv | awk -F'\t' 'BEGIN{t=0}{t=t+$4}END{print t/NR}'

# same as above, but 95th percentile
cat file.csv | sort -n -t\t -k4 | awk '{all[NR] = $4} END{print all[int(NR*0.95 - 0.5)]}'

# Calculate the media, but assume it's comma-seperated this time and use column 2
cat file.csv | sort -n -t, -k2 | awk '{all[NR] = $2} END{print all[int(NR*0.5 - 0.5)]}'

Keywords

awk, stats, percentile, mean, media,

Latest Posts


Copyright © 2020 Ben Tasker | Sitemap | Privacy Policy
Available at snippets.bentasker.co.uk, yr4pnhounvdybotb.onion and snippets.6zdgh5a5e6zpchdz.onion