Search the site...

SASCRUNCH TRAINING
  • Home
  • Member's Area
  • How to Start
  • SAS Interface
  • Creating a Data Set
  • Practical SAS Training Course
  • SAS Certified Specialist Training Program
  • Proc SQL Course
  • Introduction to Time Series Analysis
  • SAS Project Training Course
  • Full Training / Membership
  • Sign up
  • About us
  • Contact us
  • Home
  • Member's Area
  • How to Start
  • SAS Interface
  • Creating a Data Set
  • Practical SAS Training Course
  • SAS Certified Specialist Training Program
  • Proc SQL Course
  • Introduction to Time Series Analysis
  • SAS Project Training Course
  • Full Training / Membership
  • Sign up
  • About us
  • Contact us
Sentry Page Protection
Please Wait...
Data Analysis [2-15]


Normality Test
You are often required to check and ensure the data follows a normal distribution prior to many statistical analysis.
Picture

​This can be done by using:
  • (1) Numerical Method or
  • (2) Graphical Method

Example
Picture

Note: there are 82 observations in this data set. Not all of them are shown in the image above.


The TICKET data set contains 3 variables:
  • Team: Raptors (GO RAPS GO)
  • Game: The nth game of the season
  • MPrice: Median Ticket Price

An analyst is interested in checking whether the median selling price (i.e. MPrice) follows a normal distribution.


1. Numerical Method

The numerical method is based on 4 normality test results.

Example

​Proc Univariate Data=Ticket normal;
Var Mprice;
Run;
Picture

​Adding the NORMAL option to Proc Univariate creates an additional table with the list of normality tests in the output:
Picture
Note: this is the 4th table in the output.
​

The numerical method looks at the 4 normality testings:
  • ​​Shapiro-Wilk
  • Kolmogorov-Smirnov
  • Cramer-von Mises
  • Anderson-Darling

Since all of the p-values are greater than 0.05, the median ticket price is assumed to be normally distributed.
Picture


Quick Resources to learn about:
  • p-value
  • normality test

2. Graphical Method

The graphical method looks at the stem-and-leaf plot, box plot, and normality probability plots as well as the histogram.

Example
​
Proc Univariate Data=Ticket plots;
Var Mprice;    
Histogram;
Run;
Picture

The PLOTS option generates 3 plots:
  1. Stem-and-leaf plot (or a horizontal bar chart)
  2. Box plot
  3. Normal probability plot
Picture

The HISTOGRAM option plots the histogram as well:
Picture

All of the graphs show a distribution that is fairly close to a normal distribution.

Quick Resources to learn about:
  • normal probability plot

Exercise

Locate the CARS data set from the SASHelp library.

Analyze whether the MSRP follows a normal distribution using both the numerical and graphical methods.

What would be your conclusion and why?
Next

Need some help? 


HINT:
What are the p-values from the normality tests? How do the plots look like?


SOLUTION:
Proc Univariate Data=SASHelp.cars normal plots;
Var MSRP;
histogram;
Run;

The data does not seem to be normally distributed.

The p-values are all less than 0.05, which rejects the hypothesis that the data is normally distributed.

Both of the bar chart and box plot show asymmetric distribution with the majority of the MSRP falling in the range of $15,000 to $45,000.

The normal probability plot bends up to the left of the normal line. This shows a long tail to the right.


Fill out my online form.

Already a member? Go to member's area.