Statistical Analysis 1.4

Search the site...

Sentry Page Protection

Statistical Analysis [4-7]

Two-sample t-test

A two-sample t-test can be used to compare the means of two independent population.

Example of application of two-sample t-test:

Compare the treatment efficacy between the treatment and placebo group
Compare the effectiveness of a marketing campaign on two groups of customer
Compare the income inequity between the two gender

Example

Proc Format;
Value Trt 0 = "Placebo"
		  1 = "Treatment";
Run;

Data Vital;
Input Subject $ Trt SBP;
Format Trt Trt.;
Datalines;
SBJ9001 0 107
SBJ9002 0 88
SBJ9003 1 79
SBJ9004 0 126
SBJ9005 1 109
SBJ9006 0 137
SBJ9007 0 94
SBJ9008 0 99
SBJ9009 1 78
SBJ9010 1 83
SBJ9011 0 88
SBJ9012 1 91
SBJ9013 1 99
SBJ9014 0 128
SBJ9015 0 88
SBJ9016 0 98
SBJ9017 1 128
SBJ9018 1 92
SBJ9019 0 84
SBJ9020 1 101
SBJ9021 0 92
SBJ9022 1 102
SBJ9023 0 103
SBJ9024 0 103
SBJ9025 1 95
SBJ9026 1 123
SBJ9027 0 110
SBJ9028 0 101
SBJ9029 1 122
SBJ9030 0 109
SBJ9031 0 94
SBJ9032 1 83
SBJ9033 1 78
SBJ9034 0 82
SBJ9035 1 100
SBJ9036 1 114
SBJ9037 1 124
SBJ9038 0 97
SBJ9039 1 100
SBJ9040 1 89
SBJ9041 0 107
SBJ9042 1 96
SBJ9043 1 103
SBJ9044 0 94
SBJ9045 1 93
SBJ9046 1 94
SBJ9047 0 102
SBJ9048 1 89
SBJ9049 0 118
SBJ9050 1 128
;
Run;

A biostatistician is hoping to find out if a newly developed treatment raised the systolic blood pressure (mmHg) on the targeted patients.

A study has been conducted to compare the systolic blood pressure on both the treatment group and the placebo group.

The data is captured in the VITAL data set.

Let's take a look at how you can compare the means of the systolic blood pressure between the two groups of patient.

Example

Proc ttest Data=Vital;
Class Trt;
Var SBP;
Run;

The CLASS statement is used to identify the two populations for the two-sample t-test.

The following results are generated:

1. Summary statistics and confidence limits for the two populations

The mean SBP (Systolic blood pressure) is 99.7 and 102 for the treatment group and the placebo group, respectively, with the two groups having a fairly close standard error (2.87 vs. 3.02).

2. P-value for Pooled and Satterthwaite methods

Important!

There are two p-values computed when performing two-sample t-test:

0.5835 (Pooled Method)
0.5821 (Satterthwaite Method)

Which p-value should be used?

That depends on whether the two populations have an equal variance.

If the two populations have an equal variance, use the pooled method; Otherwise, use the Satterthwaite method.

How to tell if the variances are equal?

Proc ttest actually computed the equality of variance test result.

H0: σ²1 = σ²2
H1: σ²1 ≠ σ²2

The p-value for the equality of variances test is 0.6611, which fails to reject the hypothesis that the variances are equal.

As a result, the p-value from the Pooled method (0.5835) should be used.

3. Histogram and Q-Q Plot

Some of the related graphs such as the histogram and the Q-Q plot are also plotted.

One of the main assumptions for paired t-test is that the difference should be approximately normally distributed.

The linear pattern from the q-q plot suggests the difference follows a normal distribution reasonably well.

Exercise

Copy and run the CAMPAIGN data set from the yellow box below:

Data Campaign;
Input Segement Group Purchase;
Datalines;
23 1 1118
23 1 1326
23 0 1774
23 0 1483
23 0 1226
23 0 1021
23 1 982
23 1 1647
23 1 1535
23 1 442
23 0 81
23 0 232
23 1 1175
23 0 2596
23 1 1743
23 0 1251
23 1 1454
23 1 69
23 0 224
23 0 1442
23 1 1177
23 0 1483
23 0 1910
23 0 69
23 0 255
23 0 855
23 1 259
23 1 360
23 1 337
23 0 1805
23 0 867
23 0 1487
23 0 173
23 1 1446
23 0 1447
23 1 1703
23 0 491
23 0 1970
23 0 1290
23 1 1787
23 0 368
23 0 1018
23 0 1881
23 1 1137
23 1 693
23 0 1097
23 1 3157
23 1 1427
23 0 1436
23 0 681
23 0 1095
23 1 1848
23 0 547
23 0 527
23 1 1143
23 1 1446
23 1 1053
23 0 2212
23 0 2066
23 1 1057
23 1 1326
23 1 0
23 1 1261
23 1 1998
23 0 1416
23 1 759
23 0 1346
23 0 766
23 1 1378
23 1 246
;
Run;

A marketing campaign has been launched for a premium retail outlet.

During this campaign, a segment of their regular customer has been divided into two groups.

Group 1 received a giftcard of $100 and Group 0 did not receive any giftcard.

An analyst is hoping to find out if the campaign is effective or not, by comparing the purchase behavior between the two groups.

Conduct a two-sample t-test and find out if the purchases from one group significantly differs from the other.

Need some help?

HINT:
The Group variable should be used as the classification variable.

SOLUTION:
Proc ttest data=campaign;
class group;
var purchase;
run;

The p-value (either Pooled or Satterthwaite) at 0.8273. There is not sufficient evidence that the purchase from one group significantly differs from the other group.

Fill out my online form.