Create temporary and permanent SAS data sets

Use a DATA step to create a SAS data set from an existing SAS data set.


libname perm_dir '/folders/myfolders/datasets';

/* Creating a temporary dataset using the DATA step */
data work.shoesales;
	set sashelp.shoes;
run;

/* Creating a permenant dataset using the DATA step */
data perm_dir.shoesales;
	set sashelp.shoes;
run;

Comments:

  • Line 1 – Assign the directory where the permanent dataset will go and use the alias ‘perm_dir’ for the path.
  • Line 4-6 – Create the temporary dataset as shown by the ‘work’ library assignment where all the temporary datasets are stored.
  • Line 9-11 – Create the permanent dataset to be stored in the ‘perm_dir’ created using the libname statement.
  • The existing SAS dataset used is the shoes dataset from the ‘sashelp’ library which is a SAS permanent library. Here, there is useful SAS datasets for learning purposes and to practice on.

Investigate SAS data libraries using base SAS utility procedures

Use a LIBNAME statement to assign a library reference name to a SAS library


libname perm_dir '/folders/myfolders/datasets';

Comments:

  • Assigning the relative directory ‘/folders/myfolders/datasets’ to the
    library reference name ‘perm_dir’.

Investigate a library programmatically using the CONTENTS procedure


proc contents data=sashelp._all_ nods;
run;

Comments:

  • This statement retrieves all the datasets listed in the sashelp library as denoted by the ‘_all_’ after the sashelp library keyword. The ‘nods’ option in the statement omits the description of each datasets such as the dataset’s variables, formats and size etc.

Access data

Access SAS data sets with the SET statement


data work.shoesales;
	set sashelp.shoes;
run;

Comments:

  • The set statement here imports the SAS dataset ‘shoes’ from the sashelp library.

Use PROC IMPORT to access non-SAS data sources

Note: We will be using the UN World Population dataset that is converted to a Microsoft Excel file (.xlsx) and we will be keeping the csv format. Remove the first row for the proc import command to work.

Read delimited and Microsoft Excel (.xlsx) files with PROC IMPORT

For delimited files:


/* Reading a delimeted file (csv format) */
proc import out=work.population datafile="/folders/myfolders/datasets/population.csv" dbms=csv;
			getnames=yes;
run;

/* Print out work.population dataset to check if imported successfully */
proc print data=work.population;
run;

Partial Print Output:

Click to enlarge

For Microsoft Excel files (same output as above because same file, but in .xlxs form):


/* Reading a Microsoft Excel file */
proc import out=work.population datafile="/folders/myfolders/datasets/population.xlsx" dbms=xlsx;
			getnames=yes;
run;

/* Print out work.population dataset to check if imported successfully */
proc print data=work.population;
run;

SAS Log Output:

Click to enlarge

Use PROC IMPORT statement options (OUT=, DBMS=, REPLACE)


proc import out=work.population datafile="/folders/myfolders/datasets/population.xlsx" dbms=xlsx replace;
			getnames=yes;
run;

Comments:

  • Here, the OUT= option outputs the imported Microsoft Excel files (population.xlsx) into a SAS dataset named work.population.
  • The DBMS= option specifies the type of data to import and here it is a Microsoft Excel file so the type of data is in ‘xlsx’ form.
  • The REPLACE option just replaces the existing OUT= file if it exists (i.e. If work.population already exists then replace it will the new work.population that will be created with this PROC IMPORT statement).

Use the GUESSINGROWS statement


proc import out=work.population datafile="/folders/myfolders/datasets/population.csv" dbms=csv replace;
	guessingrows=10;
run;

Comments:

  • GUESSINGROWS statement scans 10 rows here to determine each variable’s length and data type.

Use the SAS/ACCESS XLSX engine to read a Microsoft Excel workbook.xlsx file


libname orion xlsx '/folders/myfolders/datasets/population.xlsx';

/* Check if the population.xlsx loads sucessfully */
proc print data=orion.population;
run;

Comments:

  • Adding the XLSX engine option in the LIBNAME statement allows you to use the SAS/ACCESS XLSX engine to read Microsoft Excel files.
  • If no engine is specified the default engine is used.
  • Here, the Excel file is treated as the directory and each sheet is treated as a SAS dataset.

Combine SAS data sets

Concatenate data sets


/* First Dataset */
data a;
	input number $ letter $ codename $;
	datalines;
	1 a alpha
	2 b beta
	;
run;

/* Print out the 'a' dataset */
title 'a';
proc print data=a;
run;

/* Second Dataset */
data b;
	input number $ letter $ codename $;
	datalines;
	3 c charlie
	4 d delta
	;
run;

/* Print out the 'b' dataset */
title 'b';
proc print data=b;
run;

/* Concatenate First and Second Datasets */
title 'Concatenated a and b datasets';
data concatenated;
	set a b;
run;

/* Print out the resultant dataset */
proc print data=concatenated;
run;

Output:

Concatenating datasets using the SET statement

Comments:

  • The SET statement concatenates datasets together and creates a new one in the DATA step.

Merge data sets one-to-one


/* First Dataset */
data a;
	input number $ letter $ codename $;
	datalines;
	1 a alpha
	2 b beta
	3 c charlie
	4 d delta
	5 e echo
	;
run;

/* Print out the 'a' dataset */
title 'a dataset';
proc print data=a;
run;

/* Second Dataset */
data b;
	input number $ capital $;
	datalines;
	1 A
	2 B
	3 C
	4 D
	5 E
	;
run;

/* Print out the 'b' dataset */
title 'b dataset';
proc print data=b;
run;

/* Merged one to one with First and Second Datasets */
title 'Merged one to one with a and b';
data merged;
	merge a b;
	by number;
run;

/* Print out the resultant dataset */
proc print data=merged;
run;
Merged datasets one-to-one using the Merge and BY statement

Comments:

  • To merge datasets one-to-one you have to use the MERGE statement and the BY statement to follow that states which common variable/s to merge the datasets by.

Merge data sets one-to-many

Merged datasets one-to-many using the Merge and BY statements

/* Second Dataset */
data b;
	input number $ capital $;
	datalines;
	3 A
	3 B
	3 C
	;
run;

Comments:

  • Same code as above but the ‘b’ dataset is changed

Create and manipulate SAS date values

Explain how SAS stores date and time values

SAS stores dates as integers, starting with a value 0 from the date 1, January, 1960. Therefore, any day before 1, January, 1960 will be a negative integer and any day after 1, January, 1960 will be a positive integer.


data dates;
	/* Date constants */
	before_date = '31Dec1959'd;
	base_date = '1Jan1960'd;
	after_date = '2Jan1960'd;
run;

proc print data=dates;
run;
before_date = 31, Dec, 1959 || base_date = 1, Jan, 1960 || after_date = 2, Jan, 1960

Similarly, the time values are a real number starting at the value 0 from the midnight of the day to the value 86,399.9999 where it resets to 0 for the next day. Basically it goes up one for every second in the day until it goes to midnight for which it resets to 0.

Use SAS informats to read common date and time expressions

Common SAS informats to read in common date and time expressions are listed below. Due to many variations of the date and time expressions, there is a link to the SAS documentation to explain them all in detail.

Date informats:

Time informats:

Use SAS date and time formats to specify how the values are displayed.

All possible cases of SAS data and time formats with examples with how their are displayed in the SAS output can be found in the documentation here.

Control which observations and variables in a SAS data set are processed and output

Use the WHERE statement in the DATA step to select observations to be processed

We will use the fish dataset from the sashelp library (i.e. sashelp.fish).

Partial output of the fish dataset:

sashelp.fish dataset

Here, we will use the WHERE statement in the DATA step to select the observations where the Species is ‘Parkki’.


data work.fish;
	/* We will use the fish dataset from the sashelp library */
	set sashelp.fish;
	
	/* Find all observations where the species of the fish is a Parkki */
	where Species='Parkki';
run;

proc print data=work.fish;
run;
All observations of the Parkki species

Subset variables to be output by using the DROP and KEEP statements

Here, we will be using the same sashelp.fish dataset from above. So, we would like to use the DROP statement in order to drop the columns ‘Length1’, ‘Length2’, ‘Length3’.


data work.fish;
	/* We will use the fish dataset from the sashelp library */
	set sashelp.fish;
	
	/* Drop columns Length1 Length2 Length3 */
	drop Length1 Length2 Length3;
run;

proc print data=work.fish;
run;

Partial print output:

Partial output of sashelp.fish where Length columns are dropped

Now, suppose we would like to use the KEEP statement and would like to keep only the Species and Weight columns in the sashelp.fish dataset.


data work.fish;
	/* We will use the fish dataset from the sashelp library */
	set sashelp.fish;
	
	/* Keep the Species and Weight columns only */
	keep Species Weight;
run;

proc print data=work.fish;
run;
Partial output of sashelp.fish where only the Species and Weight columns are kept

Use the DROP= and KEEP= data set options to specify columns to be processed and/or output

Note: These options have the same affect as the above DROP and KEEP statements so the output of the datasets will be the same. Also, we will be using the same sashelp.fish sas dataset.

Using the DROP= dataset option:


data work.fish (drop=Length1 Length2 Length3);
	/* We will use the fish dataset from the sashelp library */
	set sashelp.fish;
run;

proc print data=work.fish;
run;

This will produce the same output as the DROP statement as shown:

Using the KEEP= dataset option:


data work.fish (keep=Species Weight);
	/* We will use the fish dataset from the sashelp library */
	set sashelp.fish;
run;

proc print data=work.fish;
run;

This will produce the same output as the KEEP statement as shown: