SAS Style Guide
Table of Contents
- Guiding Principles
- Code spacing
- Equal signs (Assignment)
- Loop structures
- Line length
- Quotation Marks
- Logical Operators
- Titles and footnotes
- Missing values
- Escape Characters
- SQL Syntax
- File paths
- Output destinations
- GOTO Statement
- Plot Size
- General Coding References
The style/practice given here does not claim to generalize across all SAS applications and problem domains. It has been designed and implemented within the following considerations:
- Data sets are not large (measured in Mb not Gb).
- Programs are short (less than 2000 lines).
- Processing occurs on a single computer.
- The problem domain is producing standardized scientific studies.
- The code/programs must undergo a QA process.
- Programs are run primarily in Interactive Mode.
- All users are running SAS version 9.2 or newer.
- Users are editing in the Enhanced Editor.
The following list gives the criteria by which each decision was made in this guide.
- Fits within version constraints
- Clarifies presentation
- Adheres to the 'Single Responsibility Principle'1
- Assists verification of code, data, and analysis
- Is required for keyboard macros or KEYS shortcuts
- Aids code maintenance
- Allows for standardization
- Executes quickly
- Follows SAS recommendations or community traditions
Use lower case2.
Indent by two spaces. The default tab behavior can be changed in the Enhanced Editor to insert two spaces instead of a tab character.
Tools > Options... > Enhanced Editor Options
List items vertically. Indent items on the list by one level relative to the keyword. Place the closing semi-colon at the same level as the keyword3.
data good; infile ages dlm = '09'x notab missover; length first_name $ &defLength. last_name $ &defLength. age 8 ; input (_all_) (:); run;
Data Steps and Macros
The body of a PROC or DATA step should be indented by two spaces. The body of a macro should be indented by two spaces. If an option contains more than one item, form those items as a list.
%macro ImportBaseballData(); %put NOTE: [MACRO] Executing: ImportBaseballData(); data example; set sashelp.baseball; run; %mend; %macro CalculateMeanNHits(); %put NOTE: [MACRO] Executing: CalculateMeanNHits(); proc means noprint data = example; class league division ; var nHits; types league * division; output out = example_means mean = mean_nHits ; run; %mend;
Logical Structures (IF and CASE)
CASE statements, indent the
%if &SomeCondition1. %then %do; %let Variable1 = value; %let Var2 = value2; %end; %else %if &SomeCondition2. %then %do; %let Variable1 = value; %let Var2 = value2; %end;
proc sql noprint; create table example as select name , age , case when age < 13 then 'Child' when age between 13 and 19 then 'Teenager' else 'Adult' end as age_group , case sex when 'M' then 'Boy' when 'F' then 'Girl' else '' end as boy_girl from sashelp.class ; quit;
Loop Structures (DO)
END statement two spaces relative to corresponding
%do %until (&linkConnection. > 0); %if (%sysfunc(datetime()) >= &stopTime.) %then %do; %put ERROR: [&SYSMACRONAME] Operation timed out.; %sysexec(taskkill /F /IM EXCEL.EXE); %SetSystemOptions(&originalNOTES., &originalXWAIT., &originalXSYNC.); %abort cancel; %end; %let linkConnection = %sysfunc(fopen(xlDDE, S)); %end;
Include a single blank line between each step.
data example; set sashelp.baseball; run; proc means noprint data = example; class league division ; var nHits; types league * division; output out = example_means mean = mean_nHits ; run;
For macros, place a space between the execution indicator (the
put statement). If the last line is a macro call, include a space
after the last line and the
%macro Main(); %put NOTE: [MACRO] Executing: Main(); %ImportBaseballData(); %CalculateMeanNHits(); %mend;
If the last line of a macro is not a macro call, do not
insert a space between the last line and the
%macro ImportBaseballData(); %put NOTE: [MACRO] Executing: ImportBaseballData(); data example; set sashelp.baseball; run; %mend;
Equal signs (Assignment)
Align multiple statements by the equal sign. It often makes the code more readable to surround equal signs by a space on either side for single line assignments. However, this consideration is not always possible within the 80 character line limit. It may be necessary to group assignment statements according to their equal sign alignments.
%let treatment1 = 0.0; %let treatment2 = 1.2; %let treatment3 = 2.9; %let treatment4 = 7.2; %let treatment5 = 18; %let figureNumber_bw_f = 1; %let figureNumber_bw_m = 2; %let figureNumber_bl_f = 3; %let figureNumber_bl_m = 4; %let figureNumber_vtg_f = 5; %let figureNumber_vtg_m = 6; %let figureNumber_gsi_f = 7; %let figureNumber_gsi_m = 8; %let figureNumber_nts_m = 9; /*No F data exists for NTS*/ %let goptions = csymbol = black xpixels = 1294 ypixels = 800 rotate = landscape ; %let device = png;
Do not insert tab characters. Instead, use two spaces. The Enhanced Editor has an option to replace tab characters with spaces and the ability to change the default from four to two spaces.
Avoid reusing identical iterator variable names. Whenever possible,
use a descriptive name for the iterator. For example, if the loop is
iterating through days in a month, use
day rather than
d. See How
to avoid iteration errors.
Try to limit the line length to 80 characters. This is not always possible, as when working with file-paths. Do the best you can4.
Each line of the editor should contain only one line of code. A line of code is defined by a semi-colon.
The only exception to this is for lines which exceed the 80 character
limit and cannot be shortened (e.g. an
%INCLUDE statement with a
long file path). In situations requiring a break, use indentation to
make the code more readable5.
Write descriptive names whenever possible.
/*Bad: Reader has to mentally substitute a and b*/ data plot_data; set confidence_intervals (in = a) observations (in = b) ; if a then symbol = 1; if b then symbol = 2; run; /*Good: No brain required*/ data plot_data; set confidence_intervals (in = in_confidence_intervals) observations (in = in_observations) ; if in_confidence_intervals then symbol = 1; if in_observations then symbol = 2; run;
Name according to logical super/sub sets. For example,
/*Good*/ means_replicate_juvenile_bw_f /*Bad*/ replicate_means_f_bw_juvenile
In the above example, female is a subset of all body weights, body weights are of juvenile animals, and means have been calculated per replicate.
Use all lowercase. Separate words using an underscore. Never overwrite a dataset. A new data set should be created for each change6.
Use all lowercase. Separate words using an underscore7. Variables should be nouns describing 'what' rather than 'how'.
Good: number_of_eggs Bad: sum_eggs
Use camelCase8. The name should be a noun. Sometimes it may be
helpful to use Hungarian notation as with
dirOut for 'output
listEndpoints for a macro list/array style object.
Names should describe 'what' rather than 'how'10
Bad: %DeleteIfAllVariablesAreBlank(); Good: %RemoveMissingObservations(); Bad: %FISHER_VS_CONTROL(); Good: %CheckForIndependence();
For macro functions, name the macro so that it grammatically reflects the purpose.
/*Bad: awkward to read*/ data clean; set raw; if %CheckIfVariablesAreMissing() then delete; run; /*Good: reads naturally*/ data clean; set raw; if %IsMissingAllVariables() then delete; run;
Do not put the macro name after the
/*Bad*/ %macro PrintHelloWorld(); %put NOTE: [MACRO] Executing: PrintHelloWorld(); %put Hello, world!; %mend HelloWorld; /*Good*/ %macro PrintHelloWorld(); %put NOTE: [MACRO] Executing: PrintHelloWorld(); %put Hello, world!; %mend;
Save files by capitalizing each word, as in a title, separating words with spaces. Begin the file name with the study, followed by the task the program completes (plot, figure, import, etc). Indicate version number with a 'v'12. Reference corresponding table/figure numbers in parentheses at the end of the file name.
General Form: STUDY ID - Type - Specific Name Referring to Purpose v# (Report Ref).sas
Example: TO14 AMA388 - Plot - Arithmetic Mean Comparison v1 (Figure A).sas
If the program follows a sequence, name files according to the sequence.
STUDYID - 1 Analysis - Check Summary Statistics v1.sas STUDYID - 2 Plot - Arithmetic Mean Comparison v3 (Figure A).sas STUDYID - 3 Analysis - Perform T-test v2.sas STUDYID - 4 Analysis - Whatever depends on t-test v1.sas
Use PL/I style comments (
/**/). Use only a single pair of braces
*/) per block comment. Restrict comment lines to about 60
characters in length13.
%macro DetermineRemainingAnimals(); %put NOTE: [MACRO] Executing: DetermineRemainingAnimals(); /*Animal is considered to have survived only if the value of OUTCOME is "End of Study". Animal did not survive if OUTCOME is "Found Dead" or "Euthanized". Since the SURVIVE field is already defined using a different interpretation of survival, the field REMAIN is created to code the above interpretation.*/ data mortality; set _raw_mortality; if outcome = 'End of Study' then remain = 1; else remain = 0; run; %mend;
Datasets should contain all variables necessary to check calculations. Unnecessary columns should be removed. Do not overwrite data sets14.
A distinction needs to be made. There exist two forms of the
DROP keywords. One form is a data step option, the other a
statement within a data step.
Macro variable resolution
Avoid macro variable resolutions requiring double and triple
Arrange variables within each data set in a logical fashion. For instance, if performing a calculation on two variables, have the variable containing the result appear to the right of the variables used to compute the result. If several variables constitute a key, group those variables together as the leftmost variables of the data set.
Any data set which is written to the hard disk should have an identical copy which may be viewed in the session16. There should always exist for an exported data set a data set of the same name which is available within session memory.
The following note should be considered an error.
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column)
data char; character = '1'; run; data bad_uses_side_effect; set char; numeric = character * 1; run; data good_uses_conversion_function; set char; numeric = input(character, 8.); run;
All variables must be initialized. Do not create variables to be 'filled in' later19.
Each step should be delimited by the requisit
statements. Each step should have the minimum number of delimiters.
Macro variables should be delimited with periods whenever syntax highlighting works.
Options should be defined at the beginning of the program. Any step which requires options to be changed must restore all options to their original settings20.
Only use double-quotes when necessary, such as when the string contains a macro variable. Otherwise, use only single quotes21.
The application of macros advocated by this style guide deviates rather dramatically from the conventional SAS usage of them. It is not an attempt to artificially impose OOP practices. Nor is it done willy-nilly. Rather it is to address specific and pandemic problems. Although unconventional, it is surprising how well this approach addresses each of these problems:
- Maintaining accurate documentation
- Ease of understanding program logic
- Code navigation
- Code flexibility
- Ease of debugging
- See naming conventions.
- Include a
%putstatement which declares the macro name and what variables were used when called.
%macro CalculateReplicateMeans(sex, endpoint); %put NOTE: [MACRO] Executing: CalculateReplicateMeans(sex=&sex, endpoint=&endpoint); ...BODY... %mend;
- Include a blank line after the
- Include a blank line after any macro calls.
%macro %ImportData(); %put NOTE: [MACRO] Executing: ImportData(); %ImportMortalityData(); %ImportReproductionData(); %ImportHistopathologyData(); %mend;
- Do not include a line after any general SAS statements.
%macro PrintHelloWorld(); %put NOTE: [MACRO] Executing: PrintHelloWorld(); data _null_; put 'Hello, world!'; run; %mend;
- Do not repeat the macro name in the
Follow the SAS guidelines for macros, such as do not define macros inside of other macros23.
Macros can be roughly divided into three roles:
Within a direction control macro, program direction is coordinated. Smaller tasks may also be grouped. For instance,
%Main()directs the control of the program and provides high level groupings of things like importing, data cleaning, and output generation.
Direction controllers should appear above the macros which they call. The definitions of the macros contained in a controller should immediately follow the controller definition.
%macro %ImportData(); %put NOTE: [MACRO] Executing: ImportData(); %ImportMortalityData(); %ImportReproductionData(); %ImportHistopathologyData(); %mend; %macro ImportMortalityData(); %put NOTE: [MACRO] Executing: ImportMortalityData(); data mortality; set inData.mortality; run; %mend; %macro ImportReproductionData(); %put NOTE: [MACRO] Executing: ImportReproductionData(); data reproduction; set inData.reproduction_and_survival; run; %mend; %macro ImportHistopathologyData(); %put NOTE: [MACRO] Executing: ImportHistopathologyData(); data histopathology; set inData.histopathology; run; %mend;
All procedure/data steps should be enclosed in a macro. This provides self-documentation as well as aides in debugging24.
Some tasks are so ubiquitous and repetitive as to warrant being separated from specific programs as stand-alone macros. These include tasks such as opening or closing Excel, establishing DDE links, and removing duplicate rows from data sets. For direction on how to best implement utilities, see autocall25.
%macro Hello(greeting) / minoperator mindelimiter = ','; %if &greeting. in (Hi, Hello, Hey, Yo) %then %put Hello, world!; %mend;
To use the minoperator with
not, use the following form.
%if not ( &thingToCheck. in (item1, item2, item3) ) %then %do;
Notes, Warnings, and Errors
A production program should never produce WARNING or ERROR statements. Similarly, NOTE statements regarding possible logical errors, such as truncation, should not occur. Allowing such entries in the log of a completed program negates the usefulness of such messages27.
Take efforts to limit output to only essential information. Only include messages which facilitates validation28.
In numeric comparisons, use ^=, >=, <=. Avoid using the &, |, ~=, ne, ge, le, geq, leq, operators29.
not in instead of
Titles and footnotes
Title and footnote statements should be cleared immediately after they
are used30. Titles and footnotes should be cleared using a
footnote; statement, respectively.
Do not use aliases. Write out the option name in full31.
Give preference to the
MISSING() function over logical
Avoid defining escape characters whenever possible. Instead use
proc report data = sashelp.class style(header)=[ background = white rules = none verticalalign = bottom ] spanrows SPLIT='00'x ; columns name sex age height weight ; define name / 'First (*ESC*)n Name'; define sex / 'Sex'; define age / 'Age'; define height / 'Height (*ESC*)n (Inches)'; define weight / 'Weight (*ESC*)n (Lbs)'; run;
Use one of the CAT, CATS, CATX functions. Do not use COMPRESS or other functions which have a side effect providing the desired functionality. SAS Sample 24589 lists the concatenation functions and describes their behavior.
Sample 24589: Concatenation functions in SAS 9.0 and above Illustrate the new CAT functions for joining text strings. CAT - concatenates character strings without removing leading or trailing blanks CATS - concatenates character strings and removes leading and trailing blanks CATT - concatenates character strings and removes trailing blanks CATX - concatenates character strings, removes leading and trailing blanks, and inserts separators In previous versions of SAS you would have to use a combination of the LEFT, and/or the TRIM functions along with the double concatenation bars (||). If you wanted a separator, you would have to include that inside quotes.
Align SQL statements by the comma33. Use one keyword per line
proc sql noprint; create table example as select F.id , F.item1 , S.item2 from first_data_set F , second_data_set S where F.id = S.id AND S.item > 7 order by F.id ; quit;
Data sets should be written in lowercase and use underscores to separate words34. Avoid using abbreviations unless absolutely necessary. Names should reflect the contents of the data set. It may be helpful to use past-tense.
Bad: MEAN2 Bad: CalcTreatMean Good: treatment_means
File paths should not end in a slash35.
/*Good: No slash at the end.*/ %include 'C:\this\is\a\good\file\path'; /*Bad: Ends in a slash.*/ %include 'C:\this\is\a\bad\file\path\';
Suppress all unnecessary outputs, such as tables, plots, and irrelevant notes and warnings.
Each program should use the minimum number of output destinations36. If a program generates plots, the program should generate no plots within SAS and output figures directly to file when debug is FALSE. When debug is TRUE, all outputs should be restricted to temporary memory and nothing written permanently to file.
Don't use GOTO37.
Use the Golden Ratio when creating rectangular plots.
Worthwhile SAS References
Honorable SAS Mentions
These resources aren't shining beacons, but provide value:
Very few, if any, of these references were actually used in designing this style guide. Much of the advice given in them was either too general to be of use, too obvious to be worth mentioning, or just plain bad advice.
In no particular order:
- Guidelines for Coding SAS Programs
- The Elements of SAS Programming Style
- Techniques for Creating Reviewer-Friendly SAS Programs
- An Animated Guide: Coding Standards for SAS Production Programs
- Google Group: SAS Style Guide
- Roland's Bad SAS Coding Style
- PhUse: Coding Style Conventions (Decent but general)
- SAS Programming Guidelines
- Good Programming Practices in Healthcare Creating Robust Programs
- Best Practice Programming Techniques for SAS Users
- Best Practice Programming Techniques for SAS Software
- Top 5 SAS Programming Best Practices
- Good Programming Practices in SAS
- Do SAS coding "best practices" exist?
- Good SAS Programming Practices
- Top 10 (or More) Ways to Optimize Your SAS Code
- Good Programming Practices when Working Across PC SAS and UNIX SAS
- The Best of Cheesy, Sleazy SAS Tricks
- Writing Efficient SAS Codes
- Good Programming Practice for Clinical Trials
- Include Debugging Code in Your Programs
- SAS Programming Conventions
- Guidelines for Coding of SAS Programs
General Coding References
The Single Responsibility Principle is an object-oriented concept regarding the functionality of classes. In essense, it states,
"A class should have only one reason to change."
SAS does not have classes. It can, however, be partitioned into separate units, each of which manages a single responsibility. Enclose each responsibility within a macro. As best as possible,
"A macro should have only one reason to change."
Doing so has the additional benefit of providing self-documenting code, meeting the requirements of clarify and ease of verification.
Although SAS used to recommend using all caps, this is no longer the standard. It appears that caps were used in the past because SAS is old and ALGOL, COBOL, BASIC, FORTRAN, etc all required caps. SAS does not require it. Using all lower-case allows for differentiation using Pascal or Camel-Case. It is also easier to read.
This mimics what is advised for PROC SQL. It allows for easy rearrangement and modification.
Defining a line limit has several advantages. It helps ensure that the user will not have to scroll horizontally, making the code easier to navigate and read. It is not guaranteed that all users use the same font size. A line limit also facilitates printing, a common task in validation. Lines exceeding 80 characters have a tendency to wrap, breaking the formatting/arrangement of the code. The choice of 80 is somewhat arbitrary, but as a decision must be made, 80 seems a good compromise between enough length to handle most coding situations yet short enough to accomodate different font sizes, and avoid wrapping when printing.
Writing a single semi-colon per line gives uniformity to the
code, making it easier to debug. Other exceptions include such
obvious situations as when using the
%Skip utility or
Data set names are represented in the Explorer window in 'Propcase'. That is, a capitalized first letter with all subsequence letters in lowercase. Since data sets are most often accessed through the Explorer window, there is no advantage to using anything other than all lowercase. If one were to use camelCase or PascalCase, this would not be reflected in the the Explorer window. Using Propcase would be an unnecessary burden on the programmer.
The only way to separate words in a data set name is to use underscores. While this eats up a significant portion of the 32 character limit, using underscores to separate word allows data set names to be split. This may be of utility, as with variables.
Never overwrite a data set. Never. Quite often developers will continue to manipulate the same data set throughout their program. This makes verifying changes extremely difficult, if not impossible. It requires the person verifying to step through the program one line at a time, reading the code closely for syntax errors or mistakes in coding. The process becomes opaque and infernally vexing. Creating new data sets for each step helps make the process transparent. When this practice is adopted, the developer can design the data sets to be easily read and to correspond to one another. For example, if PERIMETER were to be calculated, a new dataset would be created containing the results. The new data set could also contain LENGTH and WIDTH to the left of the PERIMETER variable. In the case of aggregation/summary, such as calculating a mean or median, the data set being operated on should have the data clearly presented. For instance, ordered by group so that the median may be simply calculated by hand. A person verifying the code could then check that the calculation was correctly performed, even if that person has no knowledge of how SAS works.
In practice, creating a new data at each step presents some challenges to be aware of. For example, in calculating a mean using PROC MEANS, the output data set will not be in proper order. It must be sorted by PROC SORT afterward. Finding descriptive names for each data set in this sequence is a challenge. Often times these awkward circumstances can be avoided altogether using PROC SQL and the 'order by' command. While this may technically violate the Single Responsibility Principle, this practice can be justified in recognizing that leaving data in order should be a standard functionality to begin with.
This is motivated primarily through ease of output. Separating words by underscores allows for easy splitting. For example,
proc print heading=horizontal split = '_' data = &dataset. ; run;
An argument could be made for using camelCase or PascalCase and assigning labels. However, since it is not always clear which data sets will need to be output, one would be required to include labels for all variables. Labels are not always preserved when a dataset is exported and imported. Labels are not forward facing on all interfaces. The use of underscores avoids fiddling with labels.
Using camelCase helps distinguish between other types of objects. In the author's opinion, it also makes the variable easier to read on account of the ampersand prefix.
/*Bad: The underscore delimits the word 'number' and gives the impression that the variable is just 'number'.*/ %put There are &number_of_observations observations.; /*Good: The ampersand appears more at home in the humps and bumps of the camel.*/ %put There are &numberOfObservations observations.;
This is chosen somewhat arbitrarily. Practical considerations include:
- Given the 32 character limit, the use of underscores would require a significant motivation. The author could think of none.
- SAS stores macros using ALLCAPS. However, this is difficult to read. If ALLCAPS were used, it would make sense to use underscores, but this would eat in to the 32 character limit.
- The above imply that camelCase or PascalCase should be used.
- Macros do not have a clear analogy in other languages. They are not exactly functions. They are not classes. Perhaps their closest counterpart is a C macro. However, the convention used for naming C macros is ALLCAPS with underscores.
Describing the 'what' allows the 'how' to change in the future. It may not always be that the macro will perform its function the same way. A better approach may be discovered later or other parts of the program may change that affect how that particular step must be performed.
There are several reasons for this:
- It is not required.
- It clutters the screen with unnecessary code.
- It makes operation of keyboard macros either more difficult or impossible to code for.
If following the Single Responsibility Principle, seeing the entire macro definition is often possible. A by-product of the Single Responsibility Principle is that each macro is a simple logical chunk. It is easy to keep this in mind when working on a macro.
Or better yet, use a versioning control system.
SAS has 4 kinds of comments, each with their own quirks. Comparing SAS to contemporary langauges, the PL/I style commments are the least worst choice.
Using a single pair of comment braces per comment makes reading and editing comments easier than commenting each line individually. If each line is commented, then the end bracket must be realigned any time an edit is made. It may be possible to create a keyboard macro to perform such realignments. However, using only two braces proves sufficient in practice.
Limiting comment line length makes comments more legible. Since the line character limit is 80 characters, restricting comments to roughly 60 characters gives the appearance of fitting 'nicely' inside the program.
The 4 SAS Comment Types:
PL/I Style comments
/*This is a PL/I style comment.*/
PL/I style comments have symmetry which no other style has. SAS provides a native macro keyboard command to insert PL/I style comments. They can be used to comment out semi-colons and quotes. They also have the unique ability to comment out code mid-line. However, nested PL/I comments are not supported; mid-line comments cause errors when they themselves are commented out.
/*Notice the last bracket is not highlighted*/ data test; /* x = 1; /* test */ */ run;
Whether commenting supports nesting or not has historical origins. C, for instance, does not support nested comments. Such behavior simplifies the parser. Since SAS is written in C, it seems natural that comments would be treated similarly.
* This is an inline comment;
Inline comments are asymmetric and cannot be used mid-line. They cannot comment out semi-colons.
data _null_; length first $ 3. * won't run b/c of semi-colon ; two $ 3. ; run;
%*This is a macro comment;
Macro comments are parsed/tokened differently than the other styles within the macro processor. This appears to be their only reason for existing.
comment this is a comment;
The word 'comment' is a token to initiate a comment. It's not clear why this exists or what utility it has. It appears to be a left over piece of history from when ALGOL roamed the earth.
For completeness, it should be noted that there are differences in the way various comments are parsed. Usage Note 32684 gives details and recommends using PL/I style comments.
SAS also recommends starting PL/I style comments in column 3 to avoid
conflicts with 'some operating environments'. Apparently SAS might
/* as a request to end the SAS program or session.
However, as Windows is not such an operating environment, this
recommendation is ignored.
For example, if computing a
PERIMETER, the resulting dataset
should contain fields for
WIDTH, preferablly arranged
in the order:
PERIMETER. This allows for easy
manual checks of data. It is not guaranteed that the person doing
quality checks on the program knows the language well enough to
confirm that the code functions as intended. Including data relevant
to the calculation provides a means for validation which would be
Limiting a dataset to only the necessary variables avoids clutter.
KEEP data step option manages incoming data whereas the
KEEP statement manages outcoming data. The
KEEP data step option
requires parentheses. This syntax is difficult to align and organize.
organize. It makes indentation awkward and compromises readability.
For this reason, its use should be avoided. Since the
manages data in a fundamentally different way than the
statement, avoiding the
KEEP option may not always be possible.
However, if the data flow is managed carefully, it is often possible
to use only
This allows for an exported data set to be verified without danger of overwriting previous versions, as well as avoiding the need to create separate code expressly for the purpose of validating. Upon execution of a program, all aspects should be available to the user.
Implicit, or side-effect, conversions convert the data type as a side-effect of the primary function. This often leaves a note in the log indicating a conversion was performed. If all conversions are handled in the code explicitly then any implicit conversions are unintended and are errors. Otherwise, the notes need to be investigated.
Truncation introduces ambiguity which casts doubt on the trustworthiness of the log making validation difficult.
Using uninitialized variables introduces ambiguity. If all variables are initilized, then an uninitialized variable indicates an error. When uninitilized variables are permitted, then each occurange, intended or otherwise, must be investigated during validation.
Typically, this will be something like turning off an ODS destination and turning it back on. In the case of developing utilities, it may not be known ahead of time what various options are set at. Save the current settings for the options that will be changed and restore them after the new setting are no longer needed.
It is advised to avoid
PROC OPTSAVE and
whenever possible as they do not execute quickly.
Syntax is not highlighted within a string. Using double quotes indicates that the string contains variables or functions which may be a source of errors. This convention indicates that a double-quote string contains atypical string data.
Including the macro name in the
%mend; statement introduces
clutter. It also complicates the key-board macros used for
navigation. When adering to the Single Responsibility Principle, most
macros will be short. Therefore, the context of the macro should be
The SAS 9.2 Documentation states,
"If you use the same code repeatedly, it might be more efficient to use a macro because a macro is compiled only once during a SAS job, no matter how many times it is called."
It also states,
"So, be sure to use a macro only when necessary."
Most people appear to interpret these sorts of statements as 'use macros sparingly.' When viewed from the perspective of readability, ease in debugging, and single responsibility, macros become a necessity.
When wrapping each step of a program in a macro, only the parts of the program which have changed need to be recompiled. It can therefore be faster to wrap each step than not to. Setting keyboard macros to automate the macro definitions facilitates the development process. The penalty appears in the compilation of all the macros in a program upon submit. If this penalty proves too great, either the wrapper approach could be abandoned or a solution developed in which a stored compiled macro is used (with the source code saved within).
A common misconception is that macros are difficult to debug.
When the guidelines described here are followed, macros are actually
easier to debug than plain code. Each macro contains a
Executing... statement, each macro performs a single task, and all
macros must have unique names. This means the location of most errors
can be quickly isolated. Navigating to any particular step can be
reached through the various hotkeys or directly by a "Find" prompt.
Since each step is named, there is no ambiguity should the program
contain several instances of the same proc. Highlighting can be
easily toggled using
%if 0 %then %nrstr(%mend);, especially when
bound to an abbreviation. When a macro is finalized, removing the
highlight statement serves to indicate that it is complete.
Do not store all your utilites in a single file called 'functions.sas'. Aside from being inefficient and difficult to maintain, it is not transparent whether or not conflicts may arise.
To use the
IN operator within a macro, the
option must be set. This can be done globally via an
statement or within the macro definition statement. The latter is
preferable. First, prudence dictates setting it specifically on the
macros which require it. This avoids any unforeseen conflicts in
other parts of the program. It detaches the option from the context
in which it is relevant. Second, as a practical matter, it appears
that using the
OPTIONS statement does not allow for the delimiter to
be defined in the same statement. The delimiter must instead be
defined in the macro definition statement. This causes the
MINOPERATOR relevant code to be dispersed. Such unnecessary
splitting of code lessens clarity of code, hampers verification and
modification, as well as provides clutter (using two statements
instead of one).
It may be necessary in some situations to suppress all notes. An example is when SAS performs convergence tests. If the data is static and convergence has been established during development, suppress the notes for that section only. Leave a developer note indicating that the default notes have been disabled. This way, the log does not get cluttered, but a record still exists. Someone checking specifically for convergence would know to enable notes for that segment.
Considering notes not to be errors is endemic in the SAS community. This is a naive and dangerous oversight. As Statisticians and Data Analysts, the whole of our conclusions rests on the integrity of its underlying data. Confidence in our results requires complete confidence in our data. SAS notes are dangerous because they are silent. An error may terminate program execution. A note certainly does not. Yet SAS delgates potential menaces like truncation and type conversion to passing mentions. Saying, "It works most of the time" is another way of saying, "It happens." Just because SAS doesn't buckle our thought-child into a car seat for us doesn't mean we shouldn't execrise such basic precautions ourselves.
SAS outputs are notoriously verbose. Try to avoid writing more than a handful of lines to the log as a matter of courtesy. No one wants to read through 5,000 lines of output when doing QA. It also defeats the effectiveness of the log to indicate how the code is functioning. Overwhelming the user is as counter-productive as providing them with too little information.
Many procedures have a
NOPRINT option. It may sometimes make sense
to restrict output using
NOPRINT but to then include a manual note
put statement. It is suggested that such notes
include an indicator to separate it from notes generated by SAS.
%put NOTE: [DEV] This is a note from the developer. %put NOTE: [MACRO] This note originated from within a macro.
Ampersands as logical operators is avoided to prevent any
readability issues with macro variables. A macro variable resolution
uses at least one ampersand (e.g.
PROC SQL the
and keyword is highlighted
which also improves readability. Pipe is avoided to be consistent
with writing out 'and'.
The numeric comparison operators chosen most closely resemble those of other contemporary programming languages.
n erases all title definitions greater than
n. Clearing immediately ensures that titles are as intended. Using
title '' statement may set the title to a single quote. SAS
advises using what is recomended here (source needed).
An alias is an abbreviated call for an option. For instance,
C= instead of
COLOR=. Most of the time, these are
consistent across statements . However, they are not always. For
instance, within a SYMBOL statement,
R= is an alias for REPEAT
whereas for an AXIS statement, it is an alias for ROTATE. Searching
for an alias within documentation can be extremely difficult.
Furthermore, using aliases also obscures the meaning of the code. It
is better to simply write everything out.
The goal is to make the code more readable. This should be
exercised with caution as the
MISSING function is more general than
a specific logical comparison. For instance, a blank character value
passed into the
MISSING function would register true. However, if
the value were supposed to be numeric, this would pass unnoticed. An
explicit comparison to the missing numeric value . would throw an
error. The author is still undecided on this matter.
This allows for easy rearrangement, removal, and adding of lines.
Within the explorer window, datasets have a capitalized first letter with all remaining letters lowercase. Therefore, camelCase or PascalCase cannot be used to improve readability within the explorer. To distinguish words, an underscore must be used.
File paths need a slash before referring to a specific file. That slash can be included in the file-path or in the file reference. Ultimately, the decision is arbitrary. However, a convention should be adopted to avoid conflicts.
When file paths are copied using
Ctrl+RMB in Windows, the path does
not end in a slash. Nor does a file path copied from Windows Explorer
contain an ending slash. This seems justification enough for
requiring ending slashes be hard-coded into references and not
SAS is notoriously verbose by default. Restrict all output to
the essentials. This can be done through use of
NOPRINT and closing
ODS destinations. Restricting output not only makes it
easier on the user, but it often dramatically improves performance.