Sentry Page Protection

**Statistical Analysis [6-7]**

**Chi-square Test of Independence**

The Chi-square test is used when examining the association and difference between two categorical variables.

Example of application of chi-square test:

Example of application of chi-square test:

- Compare the smoking behavior between Male and Female
- Compare education level by different race group
- Compare the voting preference by income level

__Example__The SMOKE data set contains a list of survey participants on their smoking behavior.

The data set has 40 observations with information about the gender (Male vs. Female) and the 3 types of smoker:

- Non-smoker
- Occasional (<=5 per week)
- Frequent (>5 per week)

A research on the smoking behavior has been conducted to find out whether men smoke significantly more than women.

The following hypothesis is being tested with the chi-square test:

H0: Smoking behavior is independent of gender

H1: Smoking behavior is dependent on gender

__Example__

Proc Freq Data=Smoke;

Table Gender * Smoker / Chisq;

Run;

A chi-square test can be done by using Proc Freq with the CHISQ option.

Run the code on SAS Studio and the following results will be generated:

Run the code on SAS Studio and the following results will be generated:

**1. Two-way Crosstabulation Table**The standard two-way crosstabulation table is created showing the counting statistics from each category.

**2. Chi-square Test Results**

The p-value is 0.7241. There is not sufficient evidence to reject the null hypothesis.

We conclude that there is no significant difference in the smoking behavior between the two genders.

**Assumption for Chi-square Test**

One of the major assumptions for chi-square test is that each cell count has to be at least 5 or above.

In our example, some of the cell counts are less than 5:

A warning is shown on the chi-square table:

**Fisher's Exact Test**

When some of the cell counts are less than 5, the Fisher's Exact test is the better test for the hypothesis testing.

__Example__

Proc Freq Data=Smoke;

Table Gender * Smoker / Fisher;

Run;

Similar to the chi-square test, simply add the keyword FISHER to the table statement to get the Fisher's exact test results:

The p-value is very high. There is no significant difference between the two genders on the smoking behaviour.

**Exercise**

Copy and run the VOTER data set from the yellow box below:

The VOTER data set contains the voting preference among the 3 income groups.

The income groups can be classified as "<70K", "70K to 120K" and ">120K".

Perform a statistical hypothesis test and find out if there is any significance difference in voting preference among the 3 income groups.

*Need some help?*

**HINT: **

A simple chi-square test should do.

**SOLUTION: **

Proc Freq Data=Voter;

Table Party*Income / chisq;

Run;

The p-value is 0.7375. There is no significant difference in voting preference among the 3 income groups.

Fill out my online form.