## Wilcoxon Rank Sum Test

A Wilcoxon Rank-Sum Test is a nonparametric test of the null hypothesis (\( H_0 \)) that it is equally likely that a randomly selected value of one population will be lesser or greater than a randomly selected value from a second population. It is often described as the non-parametric version of the two-sample t-test.

The test is often used when working with ordinal data (e.g. ranks), and when your sample size is too small to assume a normal distribution.

The Wilcoxon Rank-Sum Test is also known as

- Mann-Whitney U Test
- Mann Whitney Wilcoxon Test

The basic assumptions in using this test are the following:

- All observations from both groups are independent from each other.
- Responses are ordinal (one can say that one response is "greater" than the other).
- \( H_0 \) states that the distributions are equal.
- \( H_1 \) states that the distributions are not equal.
- Observations do not have to come from a normal distribution (making this test "nonparametric").

This is different from a two-sample t-test, where we assume that the two populations have equal variance, and the two populations are normally distributed.

## Example Case - Morning Egg Consumption by State

As an example, let's look at the number of eggs that citizens of two different populations (Oklahoma, Wisconsin residents) consume every morning. We have a sample of \( n=10 \), which is far too few to assume a normal distribution.

Our null hypothesis is that there is no difference in the number of eggs eaten based on which state a citizen resides in.

Oklahoma | Wisconsin |
---|---|

1 | 3 |

2 | 4 |

2 | 5 |

3 | 3 |

5 | 3 |

1 | 5 |

0 | 2 |

0 | 3 |

1 | 3 |

2 | 4 |

We first rank these values; if two or more people are tied, take the average ranked value.

Oklahoma Residents | Wisconsin Residents | Absolute Rank | Average Ranked Valued |
---|---|---|---|

0 | 1 | 1.5 | |

0 | 2 | 1.5 | |

1 | 3 | 4 | |

1 | 4 | 4 | |

1 | 5 | 4 | |

2 | 6 | 7.5 | |

2 | 7 | 7.5 | |

2 | 8 | 7.5 | |

2 | 9 | 7.5 | |

3 | 10 | 12.5 | |

3 | 11 | 12.5 | |

3 | 12 | 12.5 | |

3 | 13 | 12.5 | |

3 | 14 | 12.5 | |

3 | 15 | 12.5 | |

4 | 16 | 16.5 | |

4 | 17 | 16.5 | |

5 | 18 | 19 | |

5 | 19 | 19 | |

5 | 19 | 20 |

We can now take the rank sum of the two populations. First of Oklahoma:

$$ W_{Oklahoma} = \sum_{R} (Oklahoma) = $$

$$ 1.5+1.5+4+4+4+7.5+7.5+7.5+12.5+19=69 $$

Now of Wisconsin:

$$ W_{Wisconsin} = \sum_{R} (Wisconsin) = $$

$$ 7.5+12.5+12.5+12.5+12.5+12.5+12.5+16.5+16.5+19+19 = 141 $$

If these average of the rank sum were similar, then we'd expect the means of the rank sums to be approximately equal.

$$ \overline{R}(Oklahoma) = \frac{69}{10} = 6.9 $$

$$ \overline{R}(Wisconsin) = \frac{141}{10} = 14.1 $$

There seems to be a difference between the mean ranks of the two groups - but how can we quantify this difference? Is it statistically significant? There are a variety of ways we can measure this significance, depending on how large our sample size is, and whether we can assume a normal distribution.

- If our sample size were large, we could use a Z-test with the standard \( \alpha \) = 0.05.
- If our sample size were small, we can use a Wilcoxon Rank-Sum Table of Critical Values.

## Z-test for Ranks

Typically, if a sample size is less than 20, it's recommended *not* to use a Z-test for ranks. However, for the sake of example, let's see how the calculations would be made if we did have that large of a sample size.

The formula for the Z-test for ranks is:

$$ z = \frac{W - \mu_W}{\sigma_W} $$

where \( W \) is the smaller of the two rank sums (so in this case, \( W_{Oklahoma} \)), \( \mu_W \) is the expected sum of ranks, and \( \sigma_W \) is the standard error.

Under the null hypothesis, the two groups have equal mean ranks. To compute the expected sum of ranks, use the following equation:

$$ \mu_{W} = \frac{n_1(n_1+n_2+1)}{2} = \frac{10(10+10+1)}{2} = 105 $$

To calculate the standard error,

$$ \sigma_{W} = \sqrt{\frac{n_1n_2(n_1+n_2+1)}{12}} = \sqrt{\frac{(10)(10)(10+10+1)}{12}}=13.2287565 $$

Now we can go back to our equation and plug in our values!

$$ z = \frac{W - \mu_W}{\sigma_W} = \frac{69-105}{13.2287565} = -2.72 $$

Once we look up \( -2.72 \) on a Z table, we found its \( p = 0.00326 \). Thus, we reject the null hypothesis that there is no difference between the two sample populations.

## Using a Wilcoxon Rank-Sum Table of Critical Values

If you had a sample size fewer than 20, we would use a Wilcoxon Rank-Sum table of Critical Values. This table provides values of \( W_{critical} \) such that if our \( W_{observed} \) is more extreme than these thresholds, we can reject the null hypothesis. If you scroll to Section J.1, the \( W_{observed} \) values are 78, 132 for \( m=10, n=10 \) are \( 78, 132 \). Since we have no a priori knowledge of whether one group is greater or less than the other, we use a 2-tail test.

From the table, we find that our \( \alpha < 0.05 \) and we can reject the null hypothesis.

## Wilcoxon Rank-Sum Test in R

We can also use R to calculate a p-value for our sample set.

`> ok.eggs = c(0,0,1,1,1,2,2,2,3,5) > wi.eggs = c(2,3,3,3,3,3,4,4,5,5) > group = c(rep('Oklahoma Residents', 10), rep('Wisconsin Residents', 10))`

`Wilcoxon rank sum test with continuity correction data: c(ok.eggs, wi.eggs) by group W = 14, p-value = 0.006129 alternative hypothesis: true location shift is not equal to 0`

Here, you can see \( W \) is different from what we calculated above. Looking into the documentation, we see:

The literature is not unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney tests. The two most common definitions correspond to the sum of the ranks of the first sample with the minimum value subtracted or not: R subtracts and S-PLUS does not, giving a value which is larger by m(m+1)/2 for a first sample of size m. (It seems Wilcoxon's original paper used the unadjusted sum of the ranks but subsequent tables subtracted the minimum.)

To calculate \( W \), we see that they're using \( W_{Oklahoma} - n(n+1)/2 \), which comes out to \( 14. \)

Now the p-value here, 0.006129, is different from our calculated p-value above. Not sure why - any ideas? Contact me at bugs@snipcademy.com if you know!