Search the site...

SASCRUNCH TRAINING
  • Home
  • Member's Area
  • How to Start
  • SAS Interface
  • Creating a Data Set
  • Practical SAS Training Course
  • SAS Certified Specialist Training Program
  • Proc SQL Course
  • Introduction to Time Series Analysis
  • SAS Project Training Course
  • Full Training / Membership
  • Sign up
  • About us
  • Contact us
  • Home
  • Member's Area
  • How to Start
  • SAS Interface
  • Creating a Data Set
  • Practical SAS Training Course
  • SAS Certified Specialist Training Program
  • Proc SQL Course
  • Introduction to Time Series Analysis
  • SAS Project Training Course
  • Full Training / Membership
  • Sign up
  • About us
  • Contact us
Sentry Page Protection
Please Wait...
Advanced Data Manipulation [2-5]


Proc Format
The FORMAT procedure, as discussed in Variable [13-17], allows you to create your own format.

In this module, we will learn how to create different types of formats.

In addition, we will look at how to save the format in a permanent library.


​Copy and run the code from the yellow box below:

The ID_LIST data set contains five columns:
  • ID
  • GENDER
  • EDU
  • AGE
  • CITY
Picture

Numeric Format

We will first revisit how to create a standard numeric format on the GENDER variable:
Picture

The GENDER column has two distinct values:
  • 1 = M
  • 2 = F

We will create a format for gender using Proc Format:
proc format;
value gender 1 = "M"
             2 = "F";

run;

data ID_list2;
set ID_list;
format gender gender. ;
run;

A new format called GENDER is created and assigned to the GENDER variable:
Picture

The GENDER column is now formatted. The values are displayed as the letter "M" and "F".
Picture


Character Format

In addition to the numeric format, we can also create a character format.

Let's look at the EDU column in the ID_LIST data set.

The EDU column has three distinct values:
  • H
  • U
  • P
Picture

For EDU, each letter stands for the following:
  • H = High School
  • U = Undergraduate
  • P = Postgraduate

​We can create a character format so that the values are displayed properly.
proc format;
value gender 1 = "M"
             2 = "F";
             
value $edu   "H" = "High School"
             "U" = "Undergraduate"
             "P" = "Postgraduate";

run;

data ID_list2;
set ID_list;
format gender gender. 
       edu $edu.;
run;

The code above creates a new format called ($edu.). 

The leading dollar sign is needed when specifying a character format.

It is then assigned to the EDU variable:
Picture

The EDU column is now formatted:
Picture


Specifying a Range of Values

The FORMAT procedure also allows you to specify a range of values in the format definition.

Let's look at the AGE column:
Picture

The AGE column simply contains the age of the subject.

Now, we want to group the age into three age groups:
  • Age Group 1: Up to and including 30
  • Age Group 2: 32 (inclusive) to 45 (exclusive)
  • Age Group 3: 45 or older.

​Below is the code:
proc format;
value age  low - 30 = "Age group 1"
           32 -< 45 = "Age group 2"
           45 - high = "Age group 3";

run;

data ID_list2;
set ID_list;
format age age.;
run;

Let's look at how each age group is specified in the FORMAT procedure.

Age Group 1: 30 or below

The first age group is defined as 30 or below.

The range is specified as (low - 30).
Picture

"Low" is a SAS keyword that represents the lower limit of the range.

When specifying (low - 30) as the range, it includes any values up to and including 30.


​Age Group 2: 32 (inclusive) to 45 (exclusive)

Defining this age group is a little tricky.

This age group should contain ages between 32 (inclusive) to 45 (exclusive). 

A 32-year old subject should be in this age group. 

​However, a 45-year old should not.

The range in the FORMAT procedure is specified as (32 -< 45). 
Picture
The arrow sign (<) indicates less-than but not equal-to.

​In our example, the range (32 -< 45) means 32 or older, but less than 45.


​Age Group 3: 45 and older

The Age Group 3 is more straightforward. 

It includes subjects who are aged 45 years or older.

The range in the FORMAT procedure is specified as (45 - high), where the keyword "high" represents the upper limit of the range.
Picture

The code also assigns the (age.) format to the AGE variable:
Picture

​Let's look at the results:
Picture

The majority of the values are formatted, except for two values:
Picture

The value (31) and a missing value are not formatted.

In the FORMAT procedure, we have defined the age for 30 and below, and above 32.

​Age 31 is actually not defined in any group!
  • Age Group 1: 30 or below
  • Age Group 2: 32 (inclusive) to 45 (exclusive)
  • Age Group 3: 45 or older

As a result, the value is displayed without an age group.​


The missing value is also not defined in the age group definition.

Please note that, even though we have the range (low - 30) defined for Age Group 1, the keyword "low" does not include the missing value.

To fix these two issues, we will do the following:
  • Include age 31 in the definition of Age Group 2.
  • Define a range for "Other", which is a SAS keyword for values that are not defined (i.e. missing values).
proc format;
value age  low - 30 = "Age group 1"
           31 -< 45 = "Age group 2"
           45 - high = "Age group 3"
           other = "Unknown";
run;

data ID_list2;
set ID_list;
format age age.;
run;

The range for Age Group 2 is revised in the code.

In addition, the "Other" definition is added to the FORMAT procedure:
Picture

The values are now formatted properly for the AGE column:
Picture


Saving the SAS Format

When defining a format, SAS stores the format in a SAS format catalog file.

The format catalog file is stored in the WORK library, by default, and it can be saved in a permanent library to be re-used in the future.

In order to save the format catalog file in a permanent library, you need to do the following:
  1. Create a permanent library
  2. Define the format in the permanent library

​Let's look at the example below .
libname newlib '/home/your_user_name';

proc format lib=newlib;
value city 1 = "Chicago"
           2 = "New York"
           3 = "Toronto";
run;

In this example, we have created a new library called NEWLIB.

The library is associated with the shared folder ('/folders/myfolders'):
Picture

​The FORMAT procedure defines a new format called (city.).

This format is created for the CITY column in the ID_LIST data set:
Picture

In the Proc Format procedure, the LIB=NEWLIB option is specified:
Picture

This tells SAS to define the format in the NEWLIB library.

Since the library NEWLIB is associated with shared folder, you can find it in the directory (/folders/myfolders):
Picture

Important

Even though you have created the format in a permanent library, this does not mean you can use it right away.

Before we can use this format, we need to check one thing.

Run the code below in your SAS Studio:
proc options option=fmtsearch;run;

The Proc Options procedure writes the list of libraries it uses to search for the format, to the SAS log.

If the library that stores the format is not on the list, you have to add it back in.

​For example, you can see the highlighted note in the SAS log:
Picture

It lists all the libraries that SAS searches for the formats.

In this example, the NEWLIB library is NOT listed.

As a result, the (city.) format will not be loaded when being assigned to a variable.

Let's look at the example below:
proc format lib=newlib;
value city 1 = "Chicago"
           2 = "New York"
           3 = "Toronto";
run;

data ID_list2;
set ID_list;
format city city.;
run;

In this example, the (city.) format is assigned to the CITY variable.

However, the values for CITY are not formatted:
Picture

You will also see the below note in the SAS log:
Picture

In order to use the (city.) format, you need to specify the FMTSEARCH list:
proc format lib=newlib;
value city 1 = "Chicago"
           2 = "New York"
           3 = "Toronto";
run;​

options fmtsearch=(WORK newlib); 


data ID_list2;
set ID_list;
format city city.;
run;

The FMTSEARCH option allows you to specify the list of libraries that SAS uses to search for the format.

The NEWLIB library is now included in the FMTSEARCH list:
Picture

The CITY variable now displays the formatted values:
Picture

Exercise

The SASHELP.CARS data set contains a list of cars and their MSRP.

Create a new format called MSRP that defines the pricing group as follow:
  • MSRP < 30000 = Low
  • 30000 <= MSRP < 70000 = Medium
  • 70000 <= MSRP < 100000 = High
  • MSRP >= 100000 = Extremely High

Create a new data set called CARS in the work library, and assign the (msrp.) format to the MSRP variable.

Count how many cars are in each pricing group.
Next

Need some help? 


HINT:
You can use the LOW and HIGH keyword to specify the lower and upper limit of the values.


SOLUTION:

proc format;
value msrp low -< 30000 = "Low"
30000 -< 70000 = "Medium"
70000 -< 100000 = "High"
100000 - high = "Extremely High";
run;

data cars;
set sashelp.cars;
format msrp msrp.;
run ;

proc freq data=cars;
table msrp;
run;

Low: 238
Medium: 167
High: 19
Extremely High: 4


Fill out my online form.
Already a member? Go to member's area.