Section: User Commands (1)
Updated: July 2014
Return to Main Contents
datamash - command-line calculations
[OPTION] op [col] [op col ...]
Performs numeric/string operations on input from stdin.
'op' is the operation to perform;
For grouping operations 'col' is the input field to use.
File operations:
transpose, reverse
Numeric Grouping operations:
sum, min, max, absmin, absmax
Textual/Numeric Grouping operations:
count, first, last, rand
unique, collapse, countunique
Statistical Grouping operations:
mean, median, q1, q3, iqr, mode, antimode
pstdev, sstdev, pvar, svar, mad, madraw
pskew, sskew, pkurt, skurt, dpo, jarque
Grouping Options:
- -f, --full
print entire input line before op results
(default: print only the grouped keys)
- -g, --group=X[,Y,Z]
group via fields X,[Y,Z]
- --header-in
first input line is column headers
- --header-out
print column headers as first line
- -H, --headers
same as '--header-in --header-out'
- -i, --ignore-case
ignore upper/lower case when comparing text;
this affects grouping, and string operations
- -s, --sort
sort the input before grouping; this removes the
need to manually pipe the input through 'sort'
File Operation Options:
- --no-strict
allow lines with varying number of fields
- --filler=X
fill missing values with X (default %s)
General Options:
- -t, --field-separator=X
use X instead of TAB as field delimiter
- -W, --whitespace
use whitespace (one or more spaces and/or tabs)
for field delimiters
- -z, --zero-terminated
end lines with 0 byte, not newline
- --help
display this help and exit
- --version
output version information and exit
File operations:
- transpose
transpose rows, columns of the input file
- reverse
reverse field order in each line
Numeric Grouping operations
- sum
sum the of values
- min
minimum value
- max
maximum value
- absmin
minimum of the absolute values
- absmax
maximum of the absolute values
Textual/Numeric Grouping operations
- count
count number of elements in the group
- first
the first value of the group
- last
the last value of the group
- rand
one random value from the group
- unique
comma-separated sorted list of unique values
- collapse
comma-separated list of all input values
- countunique
number of unique/distinct values
Statistical Grouping operations
- mean
mean of the values
- median
median value
- q1
1st quartile value
- q3
3rd quartile value
- iqr
inter-quartile range
- mode
mode value (most common value)
- antimode
anti-mode value (least common value)
- pstdev
population standard deviation
- sstdev
sample standard deviation
- pvar
population variance
- svar
sample variance
- mad
median absolute deviation, scaled by constant 1.4826 for normal distributions
- madraw
median absolute deviation, unscaled
- sskew
skewness of the (sample) group
- pskew
skewness of the (population) group
values x reported by 'sskew' and 'pskew' operations:
x > 0 - positively skewed / skewed right
0 > x - negatively skewed / skewed left
x > 1 - highly skewed right
1 > x > 0.5 - moderately skewed right
0.5 > x > -0.5 - approximately symmetric
-0.5 > x > -1 - moderately skewed left
-1 > x - highly skewed left
- skurt
excess Kurtosis of the (sample) group
- pkurt
excess Kurtosis of the (population) group
- jarque
p-value of the Jarque-Beta test for normality
- dpo
p-value of the D'Agostino-Pearson Omnibus test for normality;
for 'jarque' and 'dpo' operations:
null hypothesis is normality;
low p-Values indicate non-normal data;
high p-Values indicate null-hypothesis cannot be rejected.
Print the sum and the mean of values from column 1:
- $ seq 10 | datamash sum 1 mean 1
55 5.5
Group input based on field 1, and sum values (per group) on field 2:
- $ cat example.txt
A 10
A 5
B 9
B 11
$ datamash -g 1 sum 2 < example.txt
A 15
B 20
Unsorted input must be sorted (with '-s'):
- $ cat example.txt
A 10
C 4
B 9
C 1
A 5
B 11
$ datamash -s -g1 sum 2 < example.txt
A 15
B 20
C 5
Which is equivalent to:
- $ cat example.txt | sort -k1,1 | datamash -g 1 sum 2
Use -h (--headers) if the input file has a header line:
- # Given a file with student name, field, test score...
$ head -n5 scores_h.txt
Name Major Score
Shawn Engineering 47
Caleb Business 87
Christian Business 88
Derek Arts 60
# Calculate the mean and standard devian for each major
$ datamash --sort --headers --group 2 mean 3 pstdev 3 < scores_h.txt
(or use short form)
$ datamash -sH -g2 mean 3 pstdev 3 < scores_h.txt
GroupBy(Major) mean(Score) pstdev(Score)
Arts 68.9 10.1
Business 87.3 4.9
Engineering 66.5 19.1
Health-Medicine 90.6 8.8
Life-Sciences 55.3 19.7
Social-Sciences 60.2 16.6
Reverse field order in each line:
- $ seq 6 | paste - - | datamash reverse
2 1
4 3
6 5
Transpose rows, columns:
- $ seq 6 | paste - - | datamash transpose
1 3 5
2 4 6
GNU Datamash Website (
Written by Assaf Gordon.
Copyright © 2014 Assaf Gordon
License GPLv3+: GNU GPL version 3 or later <>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
The full documentation for
is maintained as a Texinfo manual. If the
programs are properly installed at your site, the command
info datamash
should give you access to the complete manual.
- File operations:
- Numeric Grouping operations:
- Textual/Numeric Grouping operations:
- Statistical Grouping operations:
- Grouping Options:
- File Operation Options:
- General Options:
- File operations:
- Numeric Grouping operations
- Textual/Numeric Grouping operations
- Statistical Grouping operations
This document was created by