16 results
Index
- Julie Kezik, Melissa Hill
-
- Book:
- Data Management Essentials Using SAS and JMP
- Published online:
- 05 June 2016
- Print publication:
- 20 June 2016, pp 133-134
-
- Chapter
- Export citation
Chapter 7 - More about Common Procedures
- Julie Kezik, Melissa Hill
-
- Book:
- Data Management Essentials Using SAS and JMP
- Published online:
- 05 June 2016
- Print publication:
- 20 June 2016, pp 94-107
-
- Chapter
- Export citation
-
Summary
In this chapter, we delve deeper into the simple statistics procedures introduced in Chapter 6 and discuss how these procedures can be employed to accomplish even more tasks. Using BY and CLASS statements, output can be stratified by groups of interest. Various options for reporting missing values (MISSPRINT, MISSING, NMISS) are appropriate and useful depending on the purpose of the output. Simple statistical tests can be performed and results featured in the output while new datasets can be created containing the information provided by this output.
STRATIFIED OUTPUT USING THE BY AND CLASS STATEMENTS
Frequently, data needs to be reviewed in a stratified format. Variables like height and weight typically need to be reviewed separately for males and females, and in a clinical trial, patient outcomes must often be reviewed separately for each treatment group. Using a BY or CLASS statement allows the programmer to access the full functionality of a procedure in a stratified setting. A BY statement can be used in the MEANS, FREQ, UNIVARIATE, and CORR procedures. Before a BY statement can be invoked, the dataset must first be sorted according to the variable to be specified in that statement. The CLASS statement can only be used in the MEANS and UNIVARIATE procedure and does not require that the dataset be presorted.
Let's continue with the sashelp dataset ‘cars.’ Suppose we are curious about any differences in the distribution of ‘msrp’ based on ‘origin.’ Since we will be using a BY statement, we must first sort the dataset according to the variable we will use in that statement. SAS does not allow us to sort a dataset in the permanent sashelp library, so first we will create a work dataset and then sort it. Finally, we use the MEANS procedure with a BY statement (Example 7.1).
EXAMPLE 7.1. MEANS Procedure Syntax with BY Statement.
data cars; set sashelp.cars; run;
proc sort data = cars; by origin; run;
proc means data = cars;
var msrp ;
by origin;
run;
Figure 7.1 shows the resulting output. It is exactly like the output that would result from executing a PROC MEANS without a BY statement, but the output is stratified by the variable ‘origin.’
Acknowledgments
- Julie Kezik, Melissa Hill
-
- Book:
- Data Management Essentials Using SAS and JMP
- Published online:
- 05 June 2016
- Print publication:
- 20 June 2016, pp ix-x
-
- Chapter
- Export citation
Chapter 1 - Navigation
- Julie Kezik, Melissa Hill
-
- Book:
- Data Management Essentials Using SAS and JMP
- Published online:
- 05 June 2016
- Print publication:
- 20 June 2016, pp 1-13
-
- Chapter
- Export citation
-
Summary
Working in SAS puts a cornucopia of resources literally at our fingertips; a thorough tour of the nooks and crannies of this dynamic software will promote efficient navigation of the product and help us identify the tools that are best suited to any given task. In this chapter we begin with a guided tour of the five Base SAS windows, exploring their purpose and utility from the programmer's perspective. We then further explore those windows in the context of data migration. Finally, we touch upon SAS Enterprise Guide – comparing it with the SAS Windowing Environment and describing how the user can benefit from working in Enterprise Guide.
SAS WINDOWING ENVIRONMENT
To begin our tour of the SAS Windows, let's open the application. A toolbar can easily be identified near the top of the screen (Figure 1.1) as well as two main work spaces (Figure 1.2a).
Figure 1.1 shows the standard SAS drop-down menus and toolbar. The File and Edit menu include common commands such as new, open, save, print, copy, and paste. File and Edit menu tasks can also be executed from the toolbar using standard icons. The View and Run menus are more unique to SAS. The View menu contains a list of SAS windows and folders; it allows instant access to these windows (an important piece of information to remember should you accidentally close a window). Run allows the user to submit syntax. The more customary method for submitting programs is the Run icon, also known as the running person, which appears on the SAS toolbar.
Figure 1.2a includes the Editor, Log, and Output window tabs (bottom right-hand side of the screen). The Editor is where syntax (also referred to as code) is written; this is the way the programmer communicates with SAS software. Executing code is the primary way to accomplish all data manipulation and analysis in SAS. The Log window is where SAS software communicates with the user about its work; as syntax is processed, SAS prints an ongoing commentary in the log with respect to the progress of the tasks. Traditionally, the Output window is where any printed product of fully executed syntax (also known as output), such as tables and lists, can be found.
Chapter 8 - Data Visualization
- Julie Kezik, Melissa Hill
-
- Book:
- Data Management Essentials Using SAS and JMP
- Published online:
- 05 June 2016
- Print publication:
- 20 June 2016, pp 108-121
-
- Chapter
- Export citation
-
Summary
A SAS programmer is rarely limited to displaying data in only one way; various paths can be taken to produce similar tables and plots. Determining which procedure and graphical output are best for the task is a decision specific to the user and the needs of the project. When sharing results with a group, it may be necessary to include more than one representation in order to help everyone understand the data. The purpose of this chapter is to (1) illustrate a sampling of methods for creating output that is visually diverse yet conveys identical information, and (2) to describe how the Output Delivery System (ODS) can be used to maximize the utility of SAS data visualization tools by exporting SAS output to other software. In Chapter 6 we discussed creating tables and list-style output for both data management and processing purposes. In this chapter we take these tasks one step further using some of the same procedures in combination with ODS statistical graphics to create graphical style output. We will also discuss additional SAS procedures such as GCHART and GPLOT, which are designed specifically for plotting data.
USING THE OUTPUT DELIVERY SYSTEM (ODS)
In this chapter we will introduce the output delivery system (ODS) and the two ways we use this indispensable SAS feature. First, ODS statements can be used to export SAS output to specific destinations. The programmer can choose how and where SAS output is stored as well as has control over its format. Second, ODS Statistical Graphics can be used to create visually appealing and printer-friendly graphics.
EXAMPLE 8.1. Basic ODS Syntax.
Ods rtf file=‘C:\stats.rtf’;
Proc print data=sashelp.class;
Run;
Ods rtf close;
To ensure results are created in HTML format (output to the results viewer), go to tools > options > preferences > under the results tab and make sure the “create HTML” box is checked. Viewing in HTML is especially important when using the ODS graphics statement to create a plot, explained in Example 8.3.
CREATING PLOTS FROM PROCS
The FREQUENCY Procedure
The FREQUENCY procedure produces tables and listings. Output can be requested for one variable or for a combination of variables. Procedure options (i.e., TREND, MEASURES, CL) and additional statements (i.e., PLOTS) are used to create detailed output and graphics.
Chapter 2 - Preliminary Data Exploration
- Julie Kezik, Melissa Hill
-
- Book:
- Data Management Essentials Using SAS and JMP
- Published online:
- 05 June 2016
- Print publication:
- 20 June 2016, pp 14-28
-
- Chapter
- Export citation
-
Summary
Once a dataset is available in SAS, the next logical step is for the programmer to maximize his/her familiarity with that new information in order to optimally clean, maintain, manipulate, or report on the data. The purpose of this chapter is to become comfortable exploring data in SAS. We will build on the navigation skills we learned in Chapter 1 and lay the foundation for understanding SAS datasets and all of their components. Smaller sets of data can be explored visually using the SAS Viewtable; however, most datasets are too large for this method and require a more systematic approach. For this purpose, we recommend the CONTENTS procedure, which provides useful information with respect to the dimensions of a datatable and variable characteristics. In later chapters we will discuss other procedures such as MEANS and FREQUENCY, which provide more in-depth information about variable values.
EXPLORER AND VIEWTABLE WINDOWS
The most straightforward approach to examining a new dataset is to open it in the Viewtable and visually examine the data. Once the Viewtable is open, SAS offers a variety of tools that can help the programmer familiarize themselves with the data.
Navigating in the Explorer and Opening a Dataset in the Viewtable
The Explorer window allows the user to navigate folders and locate datasets using standard mouse-click techniques. Within the view menu, the programmer can choose to view available datasets in list or thumbnail form. When the explorer window is active, the up one level button offers the ability to climb out of one library and into another (see Figure 1.4). Double-clicking on a dataset opens that dataset in a separate window called the Viewtable, which opens in the space on the right-hand side of the screen. In this section we will use a dataset from the sashelp library. Let's begin by opening the ‘cars’ dataset in the Viewtable. From within the Explorer window click sequentially Libraries>Sashelp>Cars.
Investigating a Dataset in the Viewtable
Once a dataset is open in the Viewtable, there are a number of ways to investigate its elements. Using the scrollbars along the bottom and right side of the window brings the user to the unseen edges of the dataset.
Chapter 4 - Advanced Concepts in Dataset and Variable Manipulation
- Julie Kezik, Melissa Hill
-
- Book:
- Data Management Essentials Using SAS and JMP
- Published online:
- 05 June 2016
- Print publication:
- 20 June 2016, pp 54-63
-
- Chapter
- Export citation
-
Summary
This chapter focuses on some advanced topics in dataset and variable manipulation. We begin with a discussion of common errors that arise when combining datasets and then move on to some advanced topics in variable creation. As we execute more complex tasks, the challenges become more dynamic. In this chapter we discuss more sophisticated topics including merge errors, calendar dates, DO groups and loops, and ARRAYs. The examples included in this section are by no means all inclusive, but rather are intended to cover some common issues that come up when the concepts introduced up to this point are employed in the real world.
MERGE ERRORS
While merging datasets is a useful and commonly employed technique, it is not always straightforward. There are many ways that a data merge can result in errant data; here we focus on two of the most common issues.
When merging two or more datasets, SAS requires that variables listed in the BY statement be found in all of the datasets to be merged. Further, SAS requires that these BY variables have identical characteristics in each dataset. Consider again the datasets ‘people1_10’ and ‘people11_20’; in the case where the variable ID has more than one length, as defined in Chapter 2, SAS warns that the merge may result in errant data (Output 4.1). Although this will not stop the merge, it could create an inaccurate dataset.
Another common issue when merging datasets is the overwriting of variables with the same name. If a variable of the same name exists in each of the datasets being merged, the resulting dataset will have only one variable with that name and no indication of which dataset the values were taken from. Additionally, issues with respect to variable characteristics, similar to the length complication previously described in this section, can arise when merging datasets with the same variable in more than one dataset. Therefore, it is wise to be sure that each variable not listed in the BY statement has a unique name in each dataset. This will prevent variables from being unintentionally overwritten during the merge process.
CALENDAR DATES IN SAS
Dates are frequently used to describe the date that data was collected, the date that a relevant event occurred, or the date that data were entered into the dataset.
Contents
- Julie Kezik, Melissa Hill
-
- Book:
- Data Management Essentials Using SAS and JMP
- Published online:
- 05 June 2016
- Print publication:
- 20 June 2016, pp v-viii
-
- Chapter
- Export citation
About This Book
- Julie Kezik, Melissa Hill
-
- Book:
- Data Management Essentials Using SAS and JMP
- Published online:
- 05 June 2016
- Print publication:
- 20 June 2016, pp xi-xii
-
- Chapter
- Export citation
-
Summary
In research groups around the world, SAS is used not only by statisticians and investigators for data analysis but by programmers and data managers to handle seemingly endless libraries of priceless data. These data management teams often include support staff who have been selected and hired for their attention to detail and patience with meticulous tasks, but who are not necessarily fluent in SAS. The typical solution is for more advanced users to do excessive and simplistic programming to provide the output necessary for whatever task the assistant will be handling. This cycle creates extra work for advanced users and limits the independent effectiveness of the support team – a frustrating arrangement for all parties involved.
In the face of this conundrum we sought a training program for our support team. We found that no training program existed that met our specific needs; available resources were either too costly, too time consuming, or too statistically driven. Eventually we developed and initiated our own basic SAS training program for our programming and research assistants. Spending some structured time with employees while they explored SAS was the most economical way to teach basic users the skills they needed to complete daily tasks. Since training our own staff, we have experienced increased productivity by an empowered support team. This book is a result of that successful endeavour, which inspired us to share our curriculum with other groups who are undoubtedly faced with the same challenge.
SAS programming is a creative and iterative process designed to empower the user. The purpose of this text is not to instruct users on how to complete specific tasks, but to provide a toolkit of essentials for new and infrequent users. When used appropriately, this book will enable these users to explore, interpret, process, and summarize data independently.
Frontmatter
- Julie Kezik, Melissa Hill
-
- Book:
- Data Management Essentials Using SAS and JMP
- Published online:
- 05 June 2016
- Print publication:
- 20 June 2016, pp i-iv
-
- Chapter
- Export citation
Chapter 3 - Storing and Manipulating Data
- Julie Kezik, Melissa Hill
-
- Book:
- Data Management Essentials Using SAS and JMP
- Published online:
- 05 June 2016
- Print publication:
- 20 June 2016, pp 29-53
-
- Chapter
- Export citation
-
Summary
In this chapter we will learn about storing and manipulating data in SAS, by considering the structure of datasets and the characteristics of variables. We begin with a description of how to access existing data using the SAS specific library convention, including an overview of the different types of storage SAS offers. The Data step is introduced as the primary method for data manipulation in SAS, punctuated with plenty of sample syntax and illustrative figures providing the user with a thorough understanding of what the Data step can do for a programmer. We then move to variable creation and manipulation including an overview of functions and different ways to transform numeric and character variables.
LIBRARIES, LIBRARY REFERENCES, AND THE LIBNAME STATEMENT
SAS stores and creates data using a library convention. Libraries are a way to call upon a location within the computer by labeling it with a single letter or phrase for the duration of the SAS session. This allows the user a time-saving alternative to repeating a lengthy path name in multiple places. When pulling data from a shared server, the path can include several folders to navigate through before locating the file, for example ‘Z:\JK\SAS 2014\Paper2019\Datasets\final’.
Example 3.1, a LIBNAME statement allows the programmer to write the path once, assigning it the library reference (libref) ‘x.’ This libref can be used to locate any SAS dataset in that particular folder for the entirety of the SAS session. Library References can consist of one to eight characters and must begin with a letter, a through z, or an underscore (_); remaining characters may consist of letters, numbers, or underscores.
EXAMPLE 3.1. LIBNAME Statement Syntax.
libname x ‘Z:\JK\SAS 2014\Paper2019\Datasets\final’;
One way to view a dataset within an active library is to toggle to the explorer tab and click on the ‘Libraries’ folder (Figure 3.1). SAS will always show three active libraries, ‘Sashelp’, ‘Sasuser’, and ‘Work.’ The library ‘X’ was created in Example 3.1 using a libname statement. While the libref ‘X’ is a temporary shortcut for accessing a specific location, any dataset written to that location using the libref is stored there permanently.
By double-clicking on the desired folder we are able to view its contents. When we open a library in the Explorer window, SAS automatically displays all SAS version 8 or higher datasets stored in that location.
Chapter 5 - Introduction to Common Procedures
- Julie Kezik, Melissa Hill
-
- Book:
- Data Management Essentials Using SAS and JMP
- Published online:
- 05 June 2016
- Print publication:
- 20 June 2016, pp 64-78
-
- Chapter
- Export citation
-
Summary
Once a dataset has been created and stored in SAS, the choices are limitless. Most tasks are completed using SAS-defined procedures, often referred to as PROCs. Each PROC allows the user to manipulate and/or view their data in a new way. SAS offers a variety of different procedures; within each one, a number of options can be employed, allowing the user ultimate flexibility in their final product. Two common procedures are described in this chapter, the SORT procedure and the PRINT procedure, presented along with some of their most useful options.
THE SORT PROCEDURE
What It Does and How It Works
PROC SORT orders a SAS dataset according to the value of the variable that is listed in the BY statement. This procedure is a prerequisite for invoking a BY statement in any subsequent data steps or procedures. For syntax to run properly, the data must be sorted by the variables listed in the BY statement. Datasets can be sorted by multiple variables to further specify the order of observations. When a PROC SORT is performed, the variable(s) specified in the BY statement will be placed in ascending order. For example, the dataset featured on the left-hand side of Figure 5.1 is unsorted, and on the right it is sorted. The sorted view displays all observations with the smallest value of ‘id’ first, followed by all other observations in ascending numerical order.
Let's try an example. First, open the sashelp dataset ‘Class.’ Notice that there are 5 variables and 19 observations. By visually assessing the data we can see that it is sorted alphabetically by name. Let's assume that for our analysis it is more appropriate to sort this information by age. Example 5.1 provides syntax for changing the location and name of the existing sashelp dataset ‘Class’ and then sorting the new dataset ‘SortAge’ by the single variable ‘age.’ Enter the syntax in the editor window and click the running person icon.
EXAMPLE 5.1. Syntax for Sorting by One Variable Using PROC SORT.
data SortAge;
set sashelp.class;
run;
proc sort data = SortAge; by age; run;
When opened in the Viewtable, the observations in this dataset (SortAge) should now be in ascending order according to their value in the variable age.
Chapter 6 - Procedures for Simple Statistics
- Julie Kezik, Melissa Hill
-
- Book:
- Data Management Essentials Using SAS and JMP
- Published online:
- 05 June 2016
- Print publication:
- 20 June 2016, pp 79-93
-
- Chapter
- Export citation
-
Summary
Most of the time, regular data reporting consists of simple statistics. Answers to questions like “how many?”, “what proportion?”, “what's the highest value?”, and “are those variables related?” commonly form the basis of such reporting. SAS provides a selection of procedures that are best used to answer these most basic questions in a way that is accurate, succinct, and simple. In this chapter, we discuss four procedures that are commonly used to create user-friendly reports that answer these questions: FREQUENCY, MEANS, UNIVARIATE, and CORR.
THE FREQUENCY PROCEDURE
Perhaps the simplest way to answer the question “how many” is to use the FREQUENCY procedure, typically referred to as PROC FREQ. This procedure tells the user how many observations carry each value of a particular variable by producing tabular or list-style frequency counts for all variables listed in the TABLES statement.
Let's use the sashelp dataset ‘cars’ to explore the ins and outs of PROC FREQ. Example 6.1 shows simple syntax for executing PROC FREQ with one variable. In cases where more than one variable is included in the TABLES statement (Example 6.2), the output includes separate frequency listings for each variable, like the ones in Figure 6.1.
EXAMPLE 6.1. FREQUENCY Procedure Syntax.
proc freq data = sashelp.cars;
tables type;
run;
EXAMPLE 6.2. FREQUENCY Procedure Syntax with Two Variables.
proc freq data = sashelp.cars;
tables type drivetrain;
run;
The output shown in Figure 6.1 provides information about the variable ‘type.’ The first column of information tells us that there are six distinct values for this variable: ‘hybrid,’ ‘SUV,’ and so on. The Frequency column tells us how many observations carry each variable value, while the Percent column indicates the percentage of observations that carry each variable value. For instance, when we look at the row of information for variable value ‘SUV’, we see that there are 60 observations with this value and that those observations make up 14.02% of our data. The last two columns provide cumulative information; there are 63 observations with variable values ‘Hybrid’ or ‘SUV’, which make up 14.72% of our data. This information is especially useful in instances where the categorical values are sequential in some way, such as levels of disease severity, income, and so forth.
As with most procedures, there are a handful of options that allow the user to manipulate the appearance of the output.
How to Use This Book
- Julie Kezik, Melissa Hill
-
- Book:
- Data Management Essentials Using SAS and JMP
- Published online:
- 05 June 2016
- Print publication:
- 20 June 2016, pp xiii-xiv
-
- Chapter
- Export citation
-
Summary
This book can be used from cover to cover as a hands-on training manual or simply as a desk reference. The content is directed at first-time or infrequent users who seek immediate applicability in order to navigate, clean, and report data. In an effort to truly teach the most basic SAS skills essential to data management, this text uses a multitude of examples and screenshots to walk the reader through step-by-step instructions for executing commonly used techniques and procedures. Beginning with Chapter 2, there is a ‘Test Your Skills’ section with practice tasks and full solution sets. You will find that in SAS there is more than one way to accomplish many of the tasks; the solutions provided should in no way be perceived as exhaustive. All of the examples and practice tasks are based on datasets found in the sashelp library or created by you, the user, and require no additional software or downloading. The versions of software used for examples include SAS 9.4, SAS Enterprise Guide 4.3, and JMP Pro 10.
Chapter 9 - JMP as an Alternative
- Julie Kezik, Melissa Hill
-
- Book:
- Data Management Essentials Using SAS and JMP
- Published online:
- 05 June 2016
- Print publication:
- 20 June 2016, pp 122-132
-
- Chapter
- Export citation
-
Summary
In any data-reporting project, the finished product is only useful if the programmer has taken time to familiarize themselves with the data and determine the optimal way to represent it. JMP software from SAS is a visual and interactive tool with a focus on “statistical discovery.” Using JMP, the programmer can easily explore the data, allowing curiosity to be their guide. JMP has a family of products that include Pro, Clinical, and Genomics. Detailed product information can be found at www.jmp.com/software. In this chapter we will use JMP Pro 10 with a tight focus on the most useful pieces of this product in relation to data management and using JMP in combination with SAS.
ABOUT JMP
JMP can be a great alternative for non-SAS users to easily view, explore, understand, and summarize data. This visual and interactive product allows anyone to explore data without the burden of heavy-duty programming. JMP incorporates both statistical and graphical techniques; it offers point, click, and drag capabilities, and since it is built for discovery, options allow the user to create graphs, refine their properties, or simply begin again with a just few mouse clicks. This product can also be used as a compliment to larger statistical analysis packages for ease of data importing as well as for the visually dynamic ways JMP can help the user truly get inside their data.
ACCESSING DATA
JMP, like SAS, offers sample data that can be accessed through the help menu. However, we would like to use a file from SAS that we have already created. Let's take a look at the datafile ‘Ages’, previously created in the Test your Skills section of Chapter 8. Using the syntax in Example 9.1, produce the dataset ‘ages’ and save it to the location of your choice.
EXAMPLE 9.1 Create SAS Dataset to Open in JMP
libname x ‘Desktop’;
data x.ages; set sashelp.class;
run;
Next we are going to locate the ages.sas7bdat file and open it in JMP. To do this, open JMP and in the top left corner click File > Open. Then change “Files of Type” to “All Files”, select your file, and click open. The dataset should now be visible in the JMP window (Figure 9.1).
Data Management Essentials Using SAS and JMP
- Julie Kezik, Melissa Hill
-
- Published online:
- 05 June 2016
- Print publication:
- 20 June 2016
-
SAS programming is a creative and iterative process designed to empower you to make the most of your organization's data. This friendly guide provides you with a repertoire of essential SAS tools for data management, whether you are a new or an infrequent user. Most useful to students and programmers with little or no SAS experience, it takes a no-frills, hands-on tutorial approach to getting started with the software. You will find immediate guidance in navigating, exploring, visualizing, cleaning, formatting, and reporting on data using SAS and JMP. Step-by-step demonstrations, screenshots, handy tips, and practical exercises with solutions equip you to explore, interpret, process and summarize data independently, efficiently and effectively.