Hey everyone,
Can someone please explain to me why the mean is greater than the median for positively skewed data and vice versa for negatively skewed data.
Thanks very much!
Let's consider two separate samples taken:
Sample 1: 1 3 7 3 2 3 6 3 2 4 3 5 7 3 1
Sample 2: 7 7 6 3 8 9 7 6 4 2 2 6 8 9 6
Organising the samples in ascending order we get:
Sample 1: 1 1 2 2 3 3 3 3 3 3 4 5 6 7 7
Sample 2: 2 3 4 6 6 6 6 7 7 7 8 9 9 9
We can see that for sample 1, there are more values towards the negative end of the distribution.
Hence, we see a distribution which looks similar to the following:
We can see from the ascending ordered sample that the median for Sample 1 is 3.
Through some simple calculations, the mean is 6.
And then from Sample 2, we see that there are more values at the higher end (towards the positive end). Hence:
The median for Sample 2 is 6.47 and the mean is 7.
If we think about it, it is more likely for a data value to occur at the lower end of the scale, due to the fact that
more values occur at the lower end. The mean calculates the average of
all the values, this also means that the mean is affected by outliers, whereas the median is uneffected by outliers, hence, the mean is always going to be larger in unimodial distributions.
The mean is pulled in the direction of the extreme scores or tail (same as the direction of the skew)
So, for the general rule of thumb, the mean always occurs at the same end of the skew.