Search the site...

SASCRUNCH TRAINING
  • Home
  • Member's Area
  • How to Start
  • SAS Interface
  • Creating a Data Set
  • Practical SAS Training Course
  • SAS Certified Specialist Training Program
  • Proc SQL Course
  • Introduction to Time Series Analysis
  • SAS Project Training Course
  • Full Training / Membership
  • Sign up
  • About us
  • Contact us
  • Home
  • Member's Area
  • How to Start
  • SAS Interface
  • Creating a Data Set
  • Practical SAS Training Course
  • SAS Certified Specialist Training Program
  • Proc SQL Course
  • Introduction to Time Series Analysis
  • SAS Project Training Course
  • Full Training / Membership
  • Sign up
  • About us
  • Contact us
Sentry Page Protection
Please Wait...
Data Manipulation [4-18]


Flagging Extreme Values
Sometimes, you might want to flag the extreme values after sorting a data set.

For example, you might want to find the least expensive product after sorting a product list by price.
Picture
​
Or you might want to find the most powerful car after sorting the data set by horsepower.
Picture

You can do so by using a common SAS programming technique demonstrated below.

Example
Picture

The HOUSE data set above contains 3 variables:
  • Region
  • House
  • Price

Since the data set is small, you can easily identify the cheapest house in each region.

For example, House 3 is the cheapest house in Region A and it is selling for $430,000.
Picture

However, when you have a larger data set, say, 100 observations, identifying the lowest values wouldn't be as easy, if not impossible.

Instead, you can use a special technique that systematically identifies the lowest house price in each region.

Example

Proc Sort Data=House;
By Region Price;
Run;


Data House2;
Set House;
By Region Price;
If first.region then i=1;

Run;
Picture

The code systematically flags the house with the lowest price in each region.

When (i=1), the observation is the house that has the cheapest price in the region.

(try it!)


This technique involves two steps:

1. The code first sort the data by REGION and PRICE. 
Picture
Within each region, the house price is sorted in ascending order.
Picture

2. It then creates a new variable "i" that flags the first observation from each region.
Picture
Within each region, the first observation is flagged.
Picture

Because the price is sorted in ascending order, the first observation happens to be the one that contains the lowest house price in each region:
Picture

(try it!)
​
The variable i correctly flags the cheapest house in each neighborhood.


Flagging the last observation

Similarly, you can flag the house that has the highest selling price by flagging the last observation from each region.
​

Example

​Proc Sort Data=House;
By Region Price;
Run;

Data House2;
Set House;
By Region Price;
If last.region then i=2;

Run;
Picture

The variable "i" again correctly flags the house with the highest selling price.

​
Key Requirements
​​
There are 3 key requirements for this technique to work:

Requirement #1:
​
Data set must be first sorted by the grouping variable(s) and the result variable(s).
Picture

Grouping variable(s) is the variable that defines the groups in the data set.

In our example, the grouping variable is REGION.

Result variable(s) is the variable that determines the order within the grouping variable(s).

In our example, the result variable is PRICE. 
Picture

Requirement #2:

​The data step must include a BY statement with the grouping variable(s) and the result variable(s).
Picture

In our example, the data step includes a BY statement with REGION and PRICE.

Note: the BY statement is crucial when using this technique. Without the BY statement, this technique wouldn't work.

​
Requirement #3:

​Use the IF.GROUP condition to identify the first and last observation within the grouping variable (i.e. REGION). 
Picture
In our example, we have the following conditional statement:

If first.region then i=1;

This tells SAS to flag the observation (i.e. i=1) if it is the very first observation from each region.

Similar for the second conditional statement:

If last.region then i=2;

This tells SAS to flag the observation (i.e. i=2) if it is the last observation from each region.
Picture

As a result, all of the observations with extreme values (highest and lowest price) are flagged.
​

​Note: use this technique with caution and make sure your code meets all 3 requirements stated above.

Exercise

Locate the FISH data set from the SASHelp library. 

Create a new data set that contains only the heaviest fish from each species.
Next

Need some help? 

Get Hint
Get Solution

Fill out my online form.
Already a member? Go to member's area.