Hypothesis testing

This applet simulates fixed-level and significance testing of the means of various distributions.

There is a tabbed panel for each of the distributions. Choose a tab by clicking on it. The panels have similar layouts. The population is represented in the top-left of each panel — by a plot of the distribution and fields for its parameters. You can edit these fields to change the parameters. The mean μ and standard deviation σ of the population are displayed in boxes beside the parameter fields (except where these are already displayed in the parameter fields). You can edit these fields to change the parameters. The null and alternative hypotheses are displayed in two controls below the parameter fields, labelled Test mean: and Alternative: (similar to controls in MINITAB dialogues for tests). Initially the test mean is set to the mean of the population. The alternative hypothesis is selected from the drop-down list. In the bottom-left area of each panel are controls for controlling the simulation. These consist of fields for specifying the significance level, sample size and number of samples, a check box for switching single-step mode on and off, and buttons for starting the simulation and resetting the screen.

The right half of the window, consisting of two tabbed panels labelled Fixed-level tests and Significance tests, displays the simulation results. The distribution of the test statistic under the null hypothesis is represented by a plot at the top of each of these panels. The Normal panel uses the t-test statistic so the plot is that of Student's t distribution - the exact null distribution for this test and population. All the other population panels use the z-test statistic and the plot is that of the standard normal distribution - an approximate null distribution that requires large samples. The blue filled areas on the plot in the Fixed-level tests panel represent the rejection region. The red filled areas on the plot in the Significance tests panel represent the significance probability.

When you click on the Take samples button, the specified number of samples of the specified size are randomly generated from the population. For each sample, the test statistic and significance probability is calculated. In the Fixed-level tests panel the value of the test statistic is counted if it lies in the rejection region. Summary results about the simulation appear below the null distribution plot, consisting of the number and percentage of test statistic values that fell in the rejection region. In the Significance tests panel the values of the significance probabilities are plotted in a histogram below the null distribution plot.

In single-step mode, the values of the sample data and the value of the test statistic are plotted as small vertical red lines on the horizontal axes of the population distribution plot and on the null distribution plot in the Fixed-level tests panel respectively. The values of each simulated test statistic and significance probability are displayed in boxes below the null distribution plot in the Fixed-level tests and Significance tests panel respectively.

In the Your population panel you can directly manipulate the population distribution curve by dragging the beads up and down. Changing the shape of the distribution is equivalent to changing the population parameters in other panels: i.e. the simulation results are cleared and the mean of the distribution (displayed in a non-editable field below the plot) changes.

Accessibility

This applet can be controlled without the use of a mouse.

A button or check box can be selected by holding down the ALT key and pressing the letter key indicated by the underlined letter on the button or check box.

Focus can be moved around the tabs and controls by pressing the Tab key. A dashed rectangle around a button indicates it has focus. A dashed rectangle on a tab indicates it has focus. In the Your population panel, each bead in the population plot can be selected in this way and moved using keys as if for a slider: i.e. up & down arrow, Page Up, Page Down, Home and End keys.

When a tab has focus, the different tabs can be selected by pressing the left and right arrow keys.

When fields have focus, they display a flashing cursor and numbers can be typed into them. The focus of fields can be obtained directly by holding down the ALT key and pressing the letter key indicated by the underlined letter on the field's label. The focus for parameter fields can be obtained directly by holding down the ALT key and pressing the key suggested by the first letter of the name of the parameter: i.e. M for , S for σ, L for lambda, A for a, B for b, P for p.

When the Alternative: drop-down list has focus you can select "not equal", "less than" or "greater than" from the list by pressing the #, < or > keys respectively; or you can cycle through the list using the up and down arrow keys.

When a button has focus it can be selected by pressing the space bar or Enter key. If none of the buttons has focus, then pressing the Enter key closes the applet.

Suggested activities and questions

Choose the Normal tab.

Begin by taking random samples one at a time. Make sure, to begin with, that the Test mean is set to the value of the population mean μ, which should be zero. Thus you will be choosing values from a population for which the null hypothesis is actually true. Leave the Alternative at its default setting of not equal, so that you are performing two-sided tests. Also leave the Significance level and the Sample size at their default values of 5% and 25 respectively.

Do not click yet on the Take samples button. Instead, look at the null distribution in the right-hand panel. This graph has two areas marked in blue, one in each tail. These denote the rejection region for the test. Together they contain 5% of the probability in the null distribution (because the significance level has been fixed at 5%). Since the sample size is 25, the null distribution is a t distribution with 25 − 1 = 24 degrees of freedom. The edges of the rejection region are therefore at the 0.025 and 0.975 quantiles of this distribution, so they are at ±2.064.

Investigate how the rejection region changes when you change the way the test is set up. First, change the Alternative to greater than. This has the effect of changing the alternative hypothesis for the test from H1: μ ¹ μ0 to H1: μ > μ0. This changes the test from two-sided to one-sided. Only positive values of the test statistic provide support for the alternative hypothesis, so that the rejection region consists of points only in the right-hand tail of the null distribution. However, the total probability in the rejection region still has to be 5% (the significance level), so the limit of the rejection region must be closer to zero than it was for the two-sided test. It is now the 0.95 quantile of t(24), so it is 1.711.

What do you think will happen to the rejection region if you change the Alternative to less than? Try it, to see if you were right.

Now investigate how the rejection region changes if you change the significance level of the test. First, change the Alternative back to not equal. Predict what will happen if you change the significance level from 5% to 10%. Try it, by typing 10 in the Significance level field, and pressing Enter. What do you see?

Set the fields that define the test back to their default values. You can do this by clicking on the Reset button. Now let us actually get round to drawing a sample. Click on Single steps. The Number of samples field should change to its default value, for single steps, of 100. Now click (just once) on the Take samples button.

What happens is that the applet draws a single sample of the size specified (25) from the population distribution. The 25 individual values in the sample are marked with short red lines on the graph of the population distribution. The applet calculates the observed value of the test statistic t, based on these 25 sample values, and it marks it on the graph of the null distribution, as well as giving the value below the graph. The applet also notes how many samples you have taken so far (1), and the number and percentage of those samples that were in the rejection region. (These last two quantities will both be zero if your value of the test statistic was not in the rejection region, and 1 and 100% if it was.)

Draw some more samples, one at a time, by clicking repeatedly on the Take samples button. You should find that, for most samples, the observed test statistic is not in the rejection region, but that, occasionally, it is. This should come as no surprise; the population mean is actually 0 and you are testing the hypothesis that the population mean is 0. The null hypothesis is therefore true, and so it should be rejected only 5% of the time, in the long run, because the significance level is fixed at 5%. So, as you draw more and more samples, you should find the percentage in the rejection region beginning to settle down in the region of 5%. In other words, the probability of a Type 1 error is 5%.

The settling-down process is rather slow. When you are bored with drawing samples one at a time, click on Single steps again to uncheck it. The Number of samples field changes to its default of 1000 for the case where the samples are not drawn singly. Click on Take samples and see what happens. The software then (without picturing what it is doing) draws 1000 samples, each of size 25, and for each one it computes the observed test statistic and notes whether it falls in the rejection region. You should have found that something not too far from 5% of the resulting samples fall into the rejection region. That is, in repeated sampling, the null hypothesis will be rejected approximately the proportion of the time indicated by the chosen significance level.

Check this out for other values of the significance level. For instance, if you set the significance level to 10%, then about 10% of the samples should be in the rejection region.

Try changing the sample size for the test. Note what happens to the graph of the rejection region. For very small samples, is it still the case that the percentage of samples in the rejection region is about equal to the significance level?

So far, you have explored only what happens when the null hypothesis is true. What happens when it is false?

Click on the Reset button to change everything back to the default. Imagine now that you are still testing the null hypothesis H0: μ = 0 (so the test mean is still set at 0), but that actually the true population mean μ is not 0, but 0.5. In this case, you would expect that the sample mean would tend to be greater than 0, and that the test statistic t was in the upper (right-hand) tail of its null distribution quite a lot of the time. Check this out by changing the value of μ from 0 to 0.5, selecting Single steps and drawing a few samples. Now deselect Single steps and draw 1000 samples of size 25. What do you find?

What do you think would happen if you changed the value of μ from 0.5 to 1, and again drew 1000 samples of size 25? Try it and see if you were right.

Experiment more with changing the way the tests and simulations are set up, to see the effect on the proportion of results that fall in the rejection region.

Click on the Exponential tab. The default value of the parameter λ of this distribution is 1. (You can change that later if you like; for now, leave it at 1.) Initially the panel is set up to perform a two-sided test of the null hypothesis that the population mean is 1, with sample size of 25 and fixed significance level of 5%, and it will perform 1000 such tests. Since for the exponential distribution, the population mean μ is 1/λ, and since λ has been set as 1, this means that we are performing simulations for which the null hypothesis is true. The null, standard normal, distribution of the z-test statistic appears at the right of the window. Perform the 1000 tests. What percentage of the resulting test statistics fall in the rejection region? Try a few more sets of 1000 samples. Can you explain what is going on?

Change the sample size from 25 to 250, leaving everything else unchanged. Repeat the above experiment. What happens?

Now we will investigate significance tests.

Click on the Normal tab. Click on Reset to set everything back to the default values. This will set the applet up to test the null hypothesis that the population mean is 0, in the situation where the data do really come from a normally distributed population with mean 0 (and standard deviation 1). The sample size is 0 and the significance level is 5%. Check that these values appear in the left-hand part of the window. Make sure (by clicking on its tab if necessary) that the Fixed-level tests panel is visible at the right of the window.

Click on Single steps so that the samples of size 25 will be drawn one at a time. Draw one such sample by clicking (once) on Take samples. The sample values appear on the left, and the observed value of the test statistic appears on the graph of the null distribution at the right.

Now click on the Significance tests tab at the top right, to bring the corresponding panel into view. At the top, you will see again the null distribution of the test statistic. However, this time no rejection region is marked in blue, because there is no fixed rejection region in significance testing. Instead, the tails of the distribution, that are at least as extreme as the observed value of the test statistic, are marked in red. I cannot predict what value you obtained for the test statistic; when I did this, I got −0.3269, so the red tails marked in the distribution are those below −0.3269 and above 0.3269. (Both tails are used, because this is a two-sided test.) The significance probability (or p value) for the test is just the probability in those tails. In my case, this probability was 0.74661, and this is given in the Significance tests panel just below the upper graph.

Below, in the same panel, the applet will build up a histogram of the p values that are obtained, when the process of drawing the sample is repeated. Click on Take samples several times, to see how the histogram builds up. If you like, you can switch between the Significance tests and Fixed-level tests after you have drawn each sample, to see how the two different versions of the test relate to one another. Can you predict how the histogram is going to look after many samples are drawn?

To check whether you were right, click on Single steps again to deselect it, so that samples will be drawn 1000 at a time. Click on Take samples to draw 1000 sample at once. How does the histogram look? Click on Take samples a few more times to see the general kind of pattern that emerges. Can you explain why it looks as it does?

Check what happens when the null hypothesis is not true. First, change the population mean μ to 0.5, leaving the Test mean as zero (and not changing anything else). You saw above that in this case, a fixed level test at 5% significance level rejected the null hypothesis about 67% of the time. What do you think will happen with the histogram of p values for a significance test? Draw 1000 samples a few times to see if your prediction was correct.

Try the same experiment with the population mean μ set at 0.25, and then at 1.0. What do you observe, and does it make sense?

Now try the same sort of investigation with an approximate test using a non-normal population distribution. Choose the Exponential distribution. Click on Reset to use the defaults. (The significance level does not matter, since we are going to perform significance tests with no fixed level.) Choose the Significance tests panel at the right-hand side, and draw 1000 samples of size 25. How does the histogram look? Try a few more sets of 1000 samples.

Now change the sample size to 250 and repeat the investigation. What do you find? Can you explain your findings?

You may well wish to continue these investigations for other distributions, other sample sizes, other parameter values and so on.