Introduction to Hypothesis Testing

A hypothesis test is a statistical procedure by which a statement about a population parameter is rejected or not rejected based on sample data.

Example: A cereal box packaging plant manager suspects that the machine which packages 15 ounce boxes of corn flakes may be malfunctioning. Since it is impossible to weigh the contents of every box of cereal packaged by this machine, a sample of size n=10 boxes are selected and carefully weighed. The average of the sample is 14.90 ounces with a standard deviation of 0.14142 ounces. The manager wants to decide from this data whether or not there is cause for concern.

Every hypothesis test has two hypotheses:

1. The null hypothesis (H0), and
2. The alternative hypothesis (H
1).

The null hypothesis is variously referred to as the "hypothesis of no change" or the "status quo" hypothesis or the "conventional wisdom" hypothesis. The null hypothesis is presumed to be correct unless there is overwhelming evidence to the contrary.

For example, in our criminal justice system, the defendant is presumed not guilty, unless there is overwhelming evidence to the contrary. Thus, the null hypothesis is that a defendant is not guilty.

In our cereal example, the null hypothesis is that the true mean weight of the cereal boxes is 15 ounces. This is denoted

H0: μ = 15

The alternative hypothesis is also called the "research hypothesis." It is the hypothesis that the researcher is trying to gather information in favor of. In order to reject the null hypothesis in favor of the alternative, overwhelming evidence must be demonstrated.

In our cereal box example, the alternative hypothesis is

H1: μ ≠15

So, our hypotheses are formulated as follows:

H0: μ = 15
H
1: μ ≠ 15

There are two possible situations: either

We don't know which is the case, and we will never know for sure. All we can do is make our most reasonable guess and hope for the best.

There are two possible decisions we can make: either

So, there are four possible outcomes.

We denote the probability of a type I error by α (yes, this is the same α that we saw with confidence intervals) and the probability of a type II error by β.

Consider the type I and type II errors and their consequences for the cereal example, and the criminal justice example.

Now, the best estimator for μ is the sample mean, which in this case is 14.90. Since 14.90≠15, we have some evidence against the null hypothesis and in favor of the alternative, but is this enough evidence to reject the null hypothesis? In other words, 14.90 does not equal 15, but is 14.90 different enough from 15 to consider this overwhelming evidence against the null hypothesis?

In order to decide whether or not to reject H0 in an objective fashion, we define a test statistic.

T.S. = ( x bar - μ0) / ( s / n½)

The test statistic tells us by how many standard errors the sample mean differs from the hypothesized value of μ.

We set our α, (the probability of making a type I error) which is also called the significance level of the hypothesis test, and compare the test statistic to the critical value(s) that correspond to the level of the test.

The hypothesis test formulated above is called a two-tailed test because the null hypothesis would be rejected for sufficiently large or sufficiently small values of the sample mean. The test was formulated this way because the manager was concerned with the consequences of having too much or too little cereal in the boxes, on average.

Someone else, say a consumer advocate, would have a different agenda. The consumer advocate would only be interested in demonstrating that there was too little cereal, on average, in the 15 ounces boxes. In this case, we would formulate a one-tailed test, and in particular a left-tailed test, where the null hypothesis would only be rejected for sufficiently small values of the sample mean.

H0: μ ≥ 15
H
1: μ < 15

A right-tailed test would look like this:

H0: μ ≤ 15
H
1: μ >15

| Home |