Review on Hypothesis Testing

The objective of the hypothesis testing procedure is to find out whether we can reject the null hypothesis. To reject or not to reject the null hypothesis is a decision made by the researcher. This decision must be based upon an explicitly stated criterion. One has to specify exactly under what conditions that one is willing to reject the null hypothesis. The criterion comes from the research result. It is based upon a critical statistical information from the sample observed. It could be an individual score (if there is one person in the sample, N = 1) or a mean (if there are multiple persons in the sample, N > 1). One wants to create a dilemma between the sample statistic (i.e., the observation) and the null hypothesis. This dilemma or conflict can be represented in terms of probability or likelihood. The researcher wants to show that if the null hypothesis were true, a sample statistic (score or mean) as extreme as the one observed would be highly unlikely. How can we find this probability?

Now let’s think about why we need the comparison distribution. What is the comparison distribution? What information does it give us? What kind of use do we put it to? The comparison distribution is the distribution that the sample must belong to if the null hypothesis is true. By pretending that the sample had been drawn from the population described by the comparison distribution, we are assuming that the null hypothesis is true. The distribution is a distribution of probability. Some scores on the distribution are more likely, such as those around the mean. We are interested in those rare scores, those extreme scores at the tails of the distribution. If we can show that the sample statistic falls in the region of the distribution that is extremely rare, for instance less than 5% or 1% out of the entire distribution, then we can say that the observed sample statistic is unlikely if the null hypothesis is true. This satisfies the condition we set up for rejecting the null hypothesis.

Another issue is why we have to deal with different distributions concerning the same population. We have a distribution of individual scores and distribution of means. Which one should we use? It depends on the sample statistic. If the sample statistic is a single score (N = 1), we need to compare it with the distribution of single scores. If the sample statistic is a sample mean (N >1), then we have to compare it against a distribution of means.

T-Test for a Single Sample

We have learnt how to do hypothesis testing in situations where we know the mean and variance (therefore standard deviation) of the comparison distribution. But in real-world research we don’t have that much information about the population distributions. We have to estimate the means and standard deviation of the populations we are interested in from the data we collect in the experiment. This process is part of inferential statistics because we are making inferences about populations, which we cannot actually measure, from actual observations made on samples drawn from the populations.

This time, we learn hypothesis testing in a situation where we know the mean of the comparison distribution, but don’t know the variance. Therefore we have to estimate the variance and standard deviation from the sample statistics.

Example 1:

I believe that single parents spend more time with their children than parents on average. Census data show that parents on average spend 2 hours each day with their children. I interviewed 6 single parents and found that on average they spend 3 hours each day with their children (see following data). Do single parents actually spend more time with their children than parents on average?

Data

Name hour spend with child

Jim 4

Mary 2

Joe 3

Kim 3

Kevin 2

Judy 4

To summarize the raw data:

X (X - M) (X - M)²

4 1 1

2 -1 1

3 0 0

2 -1 1

4 1 1

M = (4 + 2 + 3 + 3 + 2 + 4)/6 = 3

SS = (1 + 1 + 1 + 1) = 4

SD² = S(X - M)²/N-1 = SS/N-1 = 4/5 = 0.80

Step 1: state the research and null hypotheses in terms of populations.

Pop. 1: single parents

Pop. 2: average parents

Research/alternative hypothesis (H_A): single parents spend more time w/kids than average parents

Null hypothesis (H₀): single parents spend same time as average parents

	Reality: H0 true	H0 false
Decision: reject H0	Type I error	Correct decision
Do not reject H0	Correct decision	Type II error

What would be type I error in this context?

What would be statistical power in this context?

According to null hypothesis, what is the average amount of time spent by single parents?

Mu for single parents will be 2 hours, same as average parents.

Step 2: Determine the characteristics of the comparison distribution, the distribution that the sample score would have come from given the null hypothesis is true. This distribution is called the comparison distribution, because you are going to compare the research result (the mean time spent with children by the 6 single parents who have been actually observed) against this distribution to find out how likely the experimental result would have occurred.

To represent the distribution of means of time spent with children with a sample size of 6 among average parents, one has to use a t distribution, not a z or normal distribution, because one does not know the variance of this comparison distribution and has to estimate it. The t distribution is a variant of the z distribution.

How do you estimate the variance of the comparison distribution? You have to start with the research data, because these are all you know about the population. The research results are informative about the population because you assume the sample observed in the experiment has been randomly drawn from the population. If there is a lot of variance in the population, you would find a lot of variance in the sample. If there is little variance in the population, you would also find little variance in the sample. First you estimate the distribution of time spent with children with by parents on average (this is the distribution of individual scores, not means).

S² = S(X - M)²/(N - 1) = SS/(N - 1)

= 4/5 = 0.8

Here (N - 1) is called the degree of freedom (df) for estimating the population variance.

The df gives the number of data points free to vary once the mean is determined.

Our comparison distribution is the distribution of means, rather than distribution of single scores. So we have to estimate the variance of the distribution of means.

S_M² = S²/N = 0.8/6 = 0.13

S_M =Ö S_M² = Ö0.13 = 0.365

And we know the mean of the distribution of the individual scores for average parents, which is also the distribution of means of this population. So we have determined the necessary characteristics of the comparison population.

Step 3: determine the cut off points.

We will use a 5% level of significance cutoff point. If the likelihood for getting a sample mean as extreme as or more extreme than the one we get in the experiment is less than 5 percent, then we will reject the null hypothesis. We need to figure out the lowest t score for the highest 5% of the distribution.

Cut off: t (5) >= 2.015

Step 4. Now we need to calculate the position of your sample statistic on the comparison distribution. In this particular case, we need to transform the sample mean (M) into a t score using the mean and standard deviation of the comparison distribution (correct selection of this standard deviation is critical since we have several Standard Deviations available).

t = (M - μ)/S_M

= (3-2)/0.365 = 2.74

Step 5. Deciding whether or not to reject the null hypothesis.

Example 2:

In order to show that the workshops offered in "Project Impact (the anti-alcoholism program)" has effectively reduced the number of beers students consume at each party, a researcher measured the number of beers drank at a particular party by 5 students who have participated in the "Project Impact" workshops (see the following data). The researcher knows that beer consumption at UW-Stout is normally distributed, with a mean of 4 but does not know the standard deviation or variance of this population. Can we reject the null hypothesis at the .05 level?

Name # beer

Jim 1

Mary 2

Joe 3

Kim 1

Kevin 1

Descriptive statistics:

X (X - M) (X - M)

Jim 1 -0.6 0.36

Mary 2 0.4 0.16

Joe 3 1.4 1.96

Kim 1 -0.6 0.36

Kevin 1 -06 0.36

M = (1 + 2 + 3 + 1 + 1)/5 = 1.6

SS = 3.2

S² = SS/(N-1) = 3.2/4 = 0.8

S_M² = S² /N = 0.8/5 = 0.16

S_M = sqr (0.16) = 0.4

HA: Participants to the workshop drink less on average than the average students

H0: Participants to the workshop drink same amount on average than the average students

T-Test for Dependent Means

In some situations, even though we do not know the population mean, we are quite ready to assume what the population mean would be if the null hypothesis is true. One such situation is when we are dealing with difference score. We deal with difference scores when we study change, e.g., change in behavior, thoughts or physiological functioning. For instance when after we give patients a particular therapy, we want to find out how effective the therapy is, in other words, how much the patients have benefited from the therapy. How can we measure this change? The answer is “by calculating the difference between the patients’ performance before and after the therapy.” We are trying to show that the therapy is useful (the research hypothesis). According to the null hypothesis it is not. What would be the difference between before and after the therapy is the null hypothesis were true? The answer is “Zero.”

Example 1

A researcher wants to test the effectiveness of a new therapy for snake phobia, a condition where persons have irrational fear of harmless snakes. He believes that this therapy would effectively reduce people's fear of snakes. He recruited 6 snake phobic subjects. Before giving the treatment, he measured each subject's fear of snake on a 0 to 10 scale, from not afraid at all to extremely afraid, with intermediate levels of fearfulness in between. Then he gave each subject treatment. Immediately after the treatment, he measured the subjects' fear of snake again, on the same 0 to 10 scale. The following are the data the researcher has collected. Do the appropriate statistical test with the significant level of p < .05. Show the five steps of hypothesis testing.

Subject # Fear before therapy Fear after therapy

1 9 2

2 10 3

3 9 2

4 8 3

5 10 4

6 9 1

Summarize the data:

Subject # Fear before Fear after difference deviation from mean sqrd deviation

1 9 2 7 7 - 6.67 = 0.33 0.11

2 10 3 7 7 - 6.67 = 0.33 0.11

3 9 2 7 7 - 6.67 = 0.33 0.11

4 8 3 5 5 - 6.67 = - 1.67 2.79

5 10 4 6 6 - 6.67 = - 0.67 0.45

6 9 1 8 8 - 6.67 = 1.33 1.77

M_diff(mean of difference) = (7 + 7 + 7 + 5 + 6 + 8)/6 = 6.67

SS of difference = 0.11 + 0.11 + 0.11 + 2.79 + 0.45 + 1.77 = 5.34

SD²of difference = 5.34/(6-1) = 1.068

SD of difference = 1.033

Step 1: restate research and null hypothesis in terms of populations

Pop 1

Pop 2

Step 2: determine the characteristic of the comparison distribution.

Since we are using an estimated population variance, the comparison distribution is a t distribution.

The Mean of the comparison distribution: 0.

Determine the variance of the comparison distribution: estimate from the sample

S² = SS/N-1 = 5.34/(6-1) = 1.068

S² is the estimated population variance. It is a Roman character rather than Greek because it is estimated from the sample statistics. This is a convention among statisticians.

N - 1 is the degree of freedom for estimating the population variance.

Have we got the variance of the comparison distribution? No. S² is the estimated variance of the distribution of individual difference scores. But our comparison distribution is distribution of mean difference scores.

S_M² = S²/N = 1.068/6 = 0.178

SD_M = Ö S_M²= Ö0.178 = 0.42

Now we have determined that the comparison distribution is a t distribution with 5 degrees of freedom, with a mean of 0 and a standard deviation of 0.43.

Step 3: determine the cut of points.

One-tailed test.

P < .05

Df = 5

t score cutoff = 2.015

Step 4: convert the experimental result into a t score

t = (6.67 - 0)/0.42 = 15.88

Step 5: decide whether to reject null hypothesis or not

Example 2

A researcher believes that testosterone increases aggressiveness in male animals. He obtained 3 male rats as his experimental subjects and gave them a large dosage of testosterone. He measured the number of times that each rat attached other rats before and after the testosterone injection. The following are data the researcher has collected. Do the appropriate statistical test with the significant level of p < .01. Show the five steps of hypothesis testing

Subject # # of attacks # attacks

before injection after injection diff dev. (dev.)²

1 2 9 -7 -2 4

2 2 6 -4 1 1

3 5 9 -4 1 1

_______________________________________________________________________________
Descriptive Stats:

M_diff = - 15/3 = - 5

SS_diff = 6

S² = SS/N-1 = 6/2 = 3

S_M² = S²/N = 3/3 = 1, SD_M = Ö S_M²= 1

t = [(-5) – 0]/1 = -5

df = 2

T-Test for Independent Means

Example 1:

A Researcher is interested in the effect of experience on people's fearfulness when engaging in scary activities (such as parachute-jumping). He thought experienced individuals would be less fearful than novice when engaging in scary activities. So he did the following experiment. 6 people who are going to undertake their first parachute-jumping from an airplane in their lives (the first-timers) and 6 people who have trained in a parachuting club for more than one year (experienced parachuters) have been selected. The experimenter measured each subject's fearfulness just before he/she jumped from the plane on a scale of 1 to 10, 1 means not fearful at all, 10 means extremely fearful. The following are the data the researcher has collected from the experiment.

Subject # Experience Fearfulness

1 0 9

2 0 10

3 0 9

4 0 8

5 0 10

6 0 9

7 1 2

8 1 3

9 1 2

10 1 3

11 1 4

12 1 1

Step 1: restate the research and null hypotheses in terms of populations.

Pop1:

Pop 2:

Step 2: determine the characteristics of the comparison distribution.

We have to decide which distribution shall be used as the comparison distribution on the basis of what kind of data we have.

Here we have a measure of the same variable (fearfulness) from two different samples, representing two different populations (experienced vs. inexperienced). We are trying to show that the two populations have different mean fearfulness. We show this by rejecting the H0, which says that the two populations do not have different fearfulness.

This means if the null hypothesis is true, the difference between the means of the population would be 0. In our research, the data consist of two sample means. If the null hypothesis is true, the difference between the sample means should be 0 in the long run. Therefore our comparison distribution should be a distribution of differences between two sample means, the sample mean from the experienced population and the sample mean from the inexperienced population. Under the null hypothesis we assume this distribution of differences between the two samples means have a mean of 0. Apparently we don't have the information about the variance of this distribution. We have to estimate it from what we know about the populations.

First of all, when we have a sample of scores, we can estimate the mean and the variance of the population the sample comes from. Here we have two sample of scores, one from the experienced population, one from the inexperienced population. According to the research hypothesis, these two samples should have come from different populations, the distributions of which have different means and maybe different standard deviations. However, we are working under the null hypothesis, so it is assumed that the two samples have been drawn randomly from the same population. The null hypothesis says that the two populations should have the same mean and standard deviation on fearfulness. Therefore the estimates of means and variances from the two samples should apply to the same distribution, i.e. a distribution with the same mean and same variance. What this means is that using the two samples, we can make two separate estimates of the same distribution. The two estimates will not yield identical estimated means and variances, because of the random errors involved in the scores in the sample. However, under the null hypothesis, the samples should have been drawn from the same population, and any differences in the estimate are only due to random error.

Let's do this estimating first. What information do we have already, and starting what we already know, what population parameters can we estimate first? These first estimates may not be the parameters that we want eventually, but they get us closer to them. First we can get two separate estimates of the population variances from the two samples. Is this an estimate of the distribution of individual scores or means? Individual scores because the estimates were based on individual scores from the samples.

The inexperienced group:

Fearfulness

(X) (X-M) (X-M)²

9 - 0.17 0.03

10 0. 83 0.69

9 - 0.17 0.03

8 - 1.17 1.37

10 0.83 0.69

9 - 0.17 0.03

___________________________________________

SS₁ = 2.84

M₁ = (9 + 10 + 9 + 8 + 10 + 9)/6 = 55/6 = 9.17

S₁²= SS/(N-1) = 2.84/5 = 0.57

The experienced group

Fearfulness

(X) (X-M) (X-M)²

2 - 0.5 0.25

3 0.5 0.25

2 - 0.5 0.25

3 0.5 0.25

4 1.5 2.25

1 - 1.5 2.25

_________________________________________

SS₂ = 5.5

M₂ = (2 + 3 + 2 + 3 + 4 + 1)/6 = 15/6 =2.5

S₂²= SS/(N-1) = 5.5/5 = 1.1

So we have made separate estimates of the population variances from the two samples. According to the null hypothesis, these should be estimates of the same distribution, because the two populations have the same distribution, i.e. with the same mean and same variance. Why do we need two separate estimates from two samples rather than just one estimate from one sample? Because estimate based on limited information involve a lot of random error. Any sample we use may not be representative of the population. We need to use as much information as possible to reduce random error and arrive at an estimate as precise as possible. We pool the two estimates to get an average. However, this is not a simple average (adding them up and divide by two), where each estimate contributes half of the value. The pooled estimate is a weighted average, weighted by the proportion of their sample size. If the two samples differ in size, the larger sample contains more information useful for the estimate, and should be give a larger weight in the pooled estimate. If the two samples have the same the size, then they should contribute equally to the pooled estimate.

S_pooled²= df₁ / df_total*S₁² +df₂ / df_total*S₂²

In this formula, df_total is the total degrees of freedom between the estimated variances from two samples. So df_total = df_{1 +}df₂= 5 + 5 = 10

df₂ / df_totaland df₁ / df_totaare called weights for the estimated variance from each sample. We need to weight them because we want the estimate from the sample that has more subjects, i.e. larger df, to contribute more to the pooled estimate of variance.

Going back to our study, S_pooled²= df₁ / df_total*S₁² +df₂ / df_total*S₂²= 5/10 * 0.57 + 5/10 * 1.1 = 0.84

But his pooled estimate of variance is the variance of a distribution of single scores, which does not match our data, which consist of two sample means (more precisely, differences between pairs of sample means). Fortunately we already know how to get the mean and variance of distribution of means of a given sample size.

Remember according to the H0, the two populations are from the same distribution. We have estimated the variance of this distribution, S_pooled². Now we need to estimate the distribution of means. This estimation depends upon the sample size. We have to make two estimates to give allowance for the possibility that the two samples have different means. In this case, we have the same sample size for both samples. Therefore:

S_M1²= S_pooled²/N₁= 0.84/6 = 0.14

S_M2²= S_pooled²/N₂= 0.84/6 = 0.14

Now we are still one step away from the estimated variance of the comparison distribution, which is a distribution of difference between two sample means.

S_difference²= S_M1² + S_M2²= 0.14 + 0.14 = 0.28

S_difference= Ö0.28 = 0.53

Step 3: determine the cut off points.

One-tailed, p < 0.01

Use the total df for determining the cutoff point. Find t needed for rejecting H₀ from the t table.

Step 4: calculate the t score

t = (M1 - M2)/S_difference= (9.17 - 2.5)/0.53 = 12.58

Step 5: Reject Null Hypothesis?

Example 2:

We have encountered this research situation before. A researcher believes that single mothers spend more time with their children than married mothers. He recruited 6 single mothers and measured how many hours they spend with their children over a two-day period of time. He recruited 6 married mothers and measured how many hours each of them spend with their children during the same period of time. The data are listed below. Carry out the hypothesis testing steps and state the results.

The calculations:

Single mothers: Married mothers:

hrs

(X₁) (X₁ - M₁) (X₁-M₁)²(X₂) (X₂ - M₂) (X₂- M₂)²

6 0 0 6 3 9

4 - 2 4 1 -2 4

9 3 9 5 2 4

7 1 1 3 0 0

7 1 1 1 -2 4

3 -3 9 1 -2 4

6 0 0 4 1 1

________________________________________________________________

SS₁= 24 SS₂= 26

M₁= 6

S₁²= 24/6 = 4

N₁= 7 df₁= 7 -1 = 6

M₂= 3

S₂²= 26/6 = 4.33

N₂= 7 df₂= 7 -1 = 6

df_total= df₁+ df₂= 6 + 6 = 12

S_pooled²= df₁ / df_total*S₁² +df₂ / df_total*S₂²= 6/12 * 4 + 6/12 * 4.33 = 4.17

We will figure out the rest in class.

Search This Blog

Psychology and Statistics

Statistics: T-Distribution and T-Tests

T-Test for Dependent Means

T-Test for Independent Means

Comments

Post a Comment

Popular posts from this blog

Making and Sharing Pre-Recorded Presentations

Video Lectures on Research Methods in Psychology

Different Summaries of the Good Samaritan Study