Sentry Page Protection
Advanced Data Manipulation [2-5]
Proc Format The FORMAT procedure, as discussed in Variable [13-17], allows you to create your own format.
In this module, we will learn how to create different types of formats. In addition, we will look at how to save the format in a permanent library. Copy and run the code from the yellow box below: The ID_LIST data set contains five columns:
Numeric Format We will first revisit how to create a standard numeric format on the GENDER variable: The GENDER column has two distinct values:
We will create a format for gender using Proc Format: proc format; value gender 1 = "M" 2 = "F"; run; data ID_list2; set ID_list; format gender gender. ; run; A new format called GENDER is created and assigned to the GENDER variable: The GENDER column is now formatted. The values are displayed as the letter "M" and "F". Character Format In addition to the numeric format, we can also create a character format. Let's look at the EDU column in the ID_LIST data set. The EDU column has three distinct values:
For EDU, each letter stands for the following:
We can create a character format so that the values are displayed properly. proc format; value gender 1 = "M" 2 = "F"; value $edu "H" = "High School" "U" = "Undergraduate" "P" = "Postgraduate"; run; data ID_list2; set ID_list; format gender gender. edu $edu.; run; The code above creates a new format called ($edu.). The leading dollar sign is needed when specifying a character format. It is then assigned to the EDU variable: The EDU column is now formatted: Specifying a Range of Values The FORMAT procedure also allows you to specify a range of values in the format definition. Let's look at the AGE column: The AGE column simply contains the age of the subject. Now, we want to group the age into three age groups:
Below is the code: proc format; value age low - 30 = "Age group 1" 32 -< 45 = "Age group 2" 45 - high = "Age group 3"; run; data ID_list2; set ID_list; format age age.; run; Let's look at how each age group is specified in the FORMAT procedure. Age Group 1: 30 or below The first age group is defined as 30 or below. The range is specified as (low - 30). "Low" is a SAS keyword that represents the lower limit of the range. When specifying (low - 30) as the range, it includes any values up to and including 30. Age Group 2: 32 (inclusive) to 45 (exclusive) Defining this age group is a little tricky. This age group should contain ages between 32 (inclusive) to 45 (exclusive). A 32-year old subject should be in this age group. However, a 45-year old should not. The range in the FORMAT procedure is specified as (32 -< 45). The arrow sign (<) indicates less-than but not equal-to.
In our example, the range (32 -< 45) means 32 or older, but less than 45. Age Group 3: 45 and older The Age Group 3 is more straightforward. It includes subjects who are aged 45 years or older. The range in the FORMAT procedure is specified as (45 - high), where the keyword "high" represents the upper limit of the range. The code also assigns the (age.) format to the AGE variable: Let's look at the results: The majority of the values are formatted, except for two values: The value (31) and a missing value are not formatted. In the FORMAT procedure, we have defined the age for 30 and below, and above 32. Age 31 is actually not defined in any group!
As a result, the value is displayed without an age group. The missing value is also not defined in the age group definition. Please note that, even though we have the range (low - 30) defined for Age Group 1, the keyword "low" does not include the missing value. To fix these two issues, we will do the following:
proc format; value age low - 30 = "Age group 1" 31 -< 45 = "Age group 2" 45 - high = "Age group 3" other = "Unknown"; run; data ID_list2; set ID_list; format age age.; run; The range for Age Group 2 is revised in the code. In addition, the "Other" definition is added to the FORMAT procedure: The values are now formatted properly for the AGE column: Saving the SAS Format When defining a format, SAS stores the format in a SAS format catalog file. The format catalog file is stored in the WORK library, by default, and it can be saved in a permanent library to be re-used in the future. In order to save the format catalog file in a permanent library, you need to do the following:
Let's look at the example below . libname newlib '/folders/myfolders'; proc format lib=newlib; value city 1 = "Chicago" 2 = "New York" 3 = "Toronto"; run; In this example, we have created a new library called NEWLIB. The library is associated with the shared folder ('/folders/myfolders'): The FORMAT procedure defines a new format called (city.). This format is created for the CITY column in the ID_LIST data set: In the Proc Format procedure, the LIB=NEWLIB option is specified: This tells SAS to define the format in the NEWLIB library. Since the library NEWLIB is associated with shared folder, you can find it in the directory (/folders/myfolders): Important Even though you have created the format in a permanent library, this does not mean you can use it right away. Before we can use this format, we need to check one thing. Run the code below in your SAS Studio: proc options option=fmtsearch;run; The Proc Options procedure writes the list of libraries it uses to search for the format, to the SAS log. If the library that stores the format is not on the list, you have to add it back in. For example, you can see the highlighted note in the SAS log: It lists all the libraries that SAS searches for the formats. In this example, the NEWLIB library is NOT listed. As a result, the (city.) format will not be loaded when being assigned to a variable. Let's look at the example below: proc format lib=newlib; value city 1 = "Chicago" 2 = "New York" 3 = "Toronto"; run; data ID_list2; set ID_list; format city city.; run; In this example, the (city.) format is assigned to the CITY variable. However, the values for CITY are not formatted: You will also see the below note in the SAS log: In order to use the (city.) format, you need to specify the FMTSEARCH list: proc format lib=newlib; value city 1 = "Chicago" 2 = "New York" 3 = "Toronto"; run; options fmtsearch=(WORK newlib); data ID_list2; set ID_list; format city city.; run; The FMTSEARCH option allows you to specify the list of libraries that SAS uses to search for the format. The NEWLIB library is now included in the FMTSEARCH list: The CITY variable now displays the formatted values: Exercise
The SASHELP.CARS data set contains a list of cars and their MSRP. Create a new format called MSRP that defines the pricing group as follow:
Create a new data set called CARS in the work library, and assign the (msrp.) format to the MSRP variable. Count how many cars are in each pricing group. Need some help?
HINT: SOLUTION: proc format; data cars; proc freq data=cars; Low: 238
Fill out my online form.
|