SAS Style Guide

Table of Contents

Considerations

The style/practice given here does not claim to generalize across all SAS applications and problem domains. It has been designed and implemented within the following considerations:

  • Data sets are not large (measured in Mb not Gb).
  • Programs are short (less than 2000 lines).
  • Processing occurs on a single computer.
  • The problem domain is producing standardized scientific studies.
  • The code/programs must undergo a QA process.
  • Programs are run primarily in Interactive Mode.
  • All users are running SAS version 9.2 or newer.
  • Users are editing in the Enhanced Editor.

Guiding Principles

The following list gives the criteria by which each decision was made in this guide.

  1. Fits within version constraints
  2. Clarifies presentation
  3. Adheres to the 'Single Responsibility Principle'1
  4. Assists verification of code, data, and analysis
  5. Is required for keyboard macros or KEYS shortcuts
  6. Aids code maintenance
  7. Allows for standardization
  8. Executes quickly
  9. Follows SAS recommendations or community traditions

Case

Use lower case2.

Space

Indentation

Indent by two spaces. The default tab behavior can be changed in the Enhanced Editor to insert two spaces instead of a tab character.

Tools > Options... > Enhanced Editor Options

List items vertically. Indent items on the list by one level relative to the keyword. Place the closing semi-colon at the same level as the keyword3.

data good;
  infile ages dlm = '09'x notab missover;
  length
    first_name $ &defLength.
    last_name  $ &defLength.
    age        8
  ;

  input (_all_) (:);
run;

Data Steps and Macros

The body of a PROC or DATA step should be indented by two spaces. The body of a macro should be indented by two spaces. If an option contains more than one item, form those items as a list.

%macro ImportBaseballData();
%put NOTE: [MACRO] Executing: ImportBaseballData();

  data example;
    set sashelp.baseball;
  run;
%mend;

%macro CalculateMeanNHits();
%put NOTE: [MACRO] Executing: CalculateMeanNHits();

  proc means noprint data = example;
    class
      league
      division
    ;
    var nHits;
    types league * division;
    output
      out  = example_means
      mean = mean_nHits
    ;
  run;
%mend;

Logical Structures (IF and CASE)

For IF and CASE statements, indent the END statement.

%if   &SomeCondition1. %then %do;
  %let Variable1 = value;
  %let Var2      = value2;
  %end;
%else %if &SomeCondition2. %then %do;
  %let Variable1 = value;
  %let Var2      = value2;
  %end;
proc sql noprint;
  create table example as
  select
      name
    , age
    , case
        when age < 13               then 'Child'
        when age between 13 and 19  then 'Teenager'
        else 'Adult'
        end as age_group
    , case sex
        when 'M' then 'Boy'
        when 'F' then 'Girl'
        else ''
        end as boy_girl
  from sashelp.class
  ;
quit;

Loop Structures (DO)

Indent the END statement two spaces relative to corresponding DO.

%do %until (&linkConnection. > 0);
  %if (%sysfunc(datetime()) >= &stopTime.) %then %do;
    %put ERROR: [&SYSMACRONAME] Operation timed out.;
    %sysexec(taskkill /F /IM EXCEL.EXE);
    %SetSystemOptions(&originalNOTES., &originalXWAIT., &originalXSYNC.);
    %abort cancel;
    %end;

  %let linkConnection = %sysfunc(fopen(xlDDE, S));
  %end;

Code spacing

Steps

Include a single blank line between each step.

data example;
  set sashelp.baseball;
run;

proc means noprint data = example;
  class
    league
    division
  ;
  var nHits;
  types league * division;
  output
    out = example_means
    mean = mean_nHits
  ;
run;

Macros

For macros, place a space between the execution indicator (the put statement). If the last line is a macro call, include a space after the last line and the %mend; statement.

%macro Main();
%put NOTE: [MACRO] Executing: Main();

  %ImportBaseballData();
  %CalculateMeanNHits();

%mend;

If the last line of a macro is not a macro call, do not insert a space between the last line and the %mend;.

%macro ImportBaseballData();
%put NOTE: [MACRO] Executing: ImportBaseballData();

  data example;
    set sashelp.baseball;
  run;
%mend;

Equal signs (Assignment)

Align multiple statements by the equal sign. It often makes the code more readable to surround equal signs by a space on either side for single line assignments. However, this consideration is not always possible within the 80 character line limit. It may be necessary to group assignment statements according to their equal sign alignments.

%let treatment1          = 0.0;
%let treatment2          = 1.2;
%let treatment3          = 2.9;
%let treatment4          = 7.2;
%let treatment5          = 18;

%let figureNumber_bw_f   = 1;
%let figureNumber_bw_m   = 2;
%let figureNumber_bl_f   = 3;
%let figureNumber_bl_m   = 4;
%let figureNumber_vtg_f  = 5;
%let figureNumber_vtg_m  = 6;
%let figureNumber_gsi_f  = 7;
%let figureNumber_gsi_m  = 8;
%let figureNumber_nts_m  = 9; /*No F data exists for NTS*/

%let goptions =
      csymbol = black
      xpixels = 1294
      ypixels = 800
      rotate  = landscape
;

%let device   = png;

Tabs

Do not insert tab characters. Instead, use two spaces. The Enhanced Editor has an option to replace tab characters with spaces and the ability to change the default from four to two spaces.

Loop structures

Avoid reusing identical iterator variable names. Whenever possible, use a descriptive name for the iterator. For example, if the loop is iterating through days in a month, use day rather than d. See How to avoid iteration errors.

Line length

Try to limit the line length to 80 characters. This is not always possible, as when working with file-paths. Do the best you can4.

Grammar

Each line of the editor should contain only one line of code. A line of code is defined by a semi-colon.

The only exception to this is for lines which exceed the 80 character limit and cannot be shortened (e.g. an %INCLUDE statement with a long file path). In situations requiring a break, use indentation to make the code more readable5.

Naming

General principle

Write descriptive names whenever possible.

/*Bad: Reader has to mentally substitute a and b*/
  data plot_data;
    set
      confidence_intervals (in = a)
      observations         (in = b)
    ;
    if a then symbol = 1;
    if b then symbol = 2;
  run;

/*Good: No brain required*/
  data plot_data;
    set
      confidence_intervals (in = in_confidence_intervals)
      observations         (in = in_observations)
    ;
    if in_confidence_intervals then symbol = 1;
    if in_observations         then symbol = 2;
  run;

Name according to logical super/sub sets. For example,

/*Good*/
means_replicate_juvenile_bw_f

/*Bad*/
replicate_means_f_bw_juvenile

In the above example, female is a subset of all body weights, body weights are of juvenile animals, and means have been calculated per replicate.

Datasets

Use all lowercase. Separate words using an underscore. Never overwrite a dataset. A new data set should be created for each change6.

Variables

Use all lowercase. Separate words using an underscore7. Variables should be nouns describing 'what' rather than 'how'.

Good: number_of_eggs

Bad:  sum_eggs

Macro Variables

Use camelCase8. The name should be a noun. Sometimes it may be helpful to use Hungarian notation as with dirOut for 'output directory' or listEndpoints for a macro list/array style object.

Macros

Use PascalCase9.

Names should describe 'what' rather than 'how'10

Bad:  %DeleteIfAllVariablesAreBlank();

Good: %RemoveMissingObservations();

Bad:  %FISHER_VS_CONTROL();

Good: %CheckForIndependence();

For macro functions, name the macro so that it grammatically reflects the purpose.

/*Bad: awkward to read*/
  data clean;
    set raw;
    if %CheckIfVariablesAreMissing() then delete;
  run;

/*Good: reads naturally*/
  data clean;
    set raw;
    if %IsMissingAllVariables() then delete;
  run;

Do not put the macro name after the %mend keyword11.

/*Bad*/
  %macro PrintHelloWorld();
  %put NOTE: [MACRO] Executing: PrintHelloWorld();

    %put Hello, world!;
  %mend HelloWorld;

/*Good*/
  %macro PrintHelloWorld();
  %put NOTE: [MACRO] Executing: PrintHelloWorld();

    %put Hello, world!;
  %mend;

Program names

Save files by capitalizing each word, as in a title, separating words with spaces. Begin the file name with the study, followed by the task the program completes (plot, figure, import, etc). Indicate version number with a 'v'12. Reference corresponding table/figure numbers in parentheses at the end of the file name.

General Form:

STUDY ID - Type - Specific Name Referring to Purpose v# (Report Ref).sas
Example:

TO14 AMA388 - Plot - Arithmetic Mean Comparison v1 (Figure A).sas

If the program follows a sequence, name files according to the sequence.

STUDYID - 1 Analysis - Check Summary Statistics v1.sas
STUDYID - 2 Plot - Arithmetic Mean Comparison v3 (Figure A).sas
STUDYID - 3 Analysis - Perform T-test v2.sas
STUDYID - 4 Analysis - Whatever depends on t-test v1.sas

Comments

Use PL/I style comments (/**/). Use only a single pair of braces (/* and */) per block comment. Restrict comment lines to about 60 characters in length13.

%macro DetermineRemainingAnimals();
%put NOTE: [MACRO] Executing: DetermineRemainingAnimals();

/*Animal is considered to have survived only if the value of OUTCOME
  is "End of Study".  Animal did not survive if OUTCOME is "Found Dead"
  or "Euthanized".  Since the SURVIVE field is already defined using a
  different interpretation of survival, the field REMAIN is created
  to code the above interpretation.*/
  data mortality;
    set _raw_mortality;
    if outcome = 'End of Study' then  remain = 1;
    else                              remain = 0;
  run;
%mend;

Datasets

Variables

Datasets should contain all variables necessary to check calculations. Unnecessary columns should be removed. Do not overwrite data sets14.

DROP/KEEP Statements

A distinction needs to be made. There exist two forms of the KEEP and DROP keywords. One form is a data step option, the other a statement within a data step.

Use KEEP and DROP statements within a data step when given a choice. Give preference to KEEP statements over DROP statements15. See section "Things to Avoid: Negatives" for an explanation.

Variables

Macro variable resolution

Avoid macro variable resolutions requiring double and triple ampersands (&&macro_var or &&&macro_var).

Logical arrangement

Arrange variables within each data set in a logical fashion. For instance, if performing a calculation on two variables, have the variable containing the result appear to the right of the variables used to compute the result. If several variables constitute a key, group those variables together as the leftmost variables of the data set.

Exporting

Any data set which is written to the hard disk should have an identical copy which may be viewed in the session16. There should always exist for an exported data set a data set of the same name which is available within session memory.

Type conversion

All type conversion should be performed explicitly using either the PUT or INPUT functions17. Do not use side-effect type approaches.

The following note should be considered an error.

NOTE: Character values have been converted to numeric values at
the places given by: (Line):(Column)
data char;
  character = '1';
run;

data bad_uses_side_effect;
  set char;
  numeric = character * 1;
run;

data good_uses_conversion_function;
  set char;
  numeric = input(character, 8.);
run;

Truncation

No variables should be truncated18. See default length.

The following warning should be considered an error.

WARNING: Multiple lengths were specified for the variable x by input data set(s). This can cause
truncation of data.

Uninitialized

All variables must be initialized. Do not create variables to be 'filled in' later19.

Delimiters

PROC/DATA Step

Each step should be delimited by the requisit RUN; or QUIT; statements. Each step should have the minimum number of delimiters.

Macro variables

Macro variables should be delimited with periods whenever syntax highlighting works.

Options

Options should be defined at the beginning of the program. Any step which requires options to be changed must restore all options to their original settings20.

Quotation Marks

Only use double-quotes when necessary, such as when the string contains a macro variable. Otherwise, use only single quotes21.

Macros

Disclaimer

The application of macros advocated by this style guide deviates rather dramatically from the conventional SAS usage of them. It is not an attempt to artificially impose OOP practices. Nor is it done willy-nilly. Rather it is to address specific and pandemic problems. Although unconventional, it is surprising how well this approach addresses each of these problems:

  • Maintaining accurate documentation
  • Ease of understanding program logic
  • Code navigation
  • Code flexibility
  • Ease of debugging

General form

  1. See naming conventions.
  2. Include a %put statement which declares the macro name and what variables were used when called.
%macro CalculateReplicateMeans(sex, endpoint);
%put NOTE: [MACRO] Executing: CalculateReplicateMeans(sex=&sex, endpoint=&endpoint);

  ...BODY...

%mend;
  1. Include a blank line after the %put statement.
  2. Include a blank line after any macro calls.
%macro %ImportData();
%put NOTE: [MACRO] Executing: ImportData();

  %ImportMortalityData();
  %ImportReproductionData();
  %ImportHistopathologyData();

%mend;
  1. Do not include a line after any general SAS statements.
%macro PrintHelloWorld();
%put NOTE: [MACRO] Executing: PrintHelloWorld();

  data _null_;
    put 'Hello, world!';
  run;
%mend;
  1. Do not repeat the macro name in the %mend; statement22.

Follow the SAS guidelines for macros, such as do not define macros inside of other macros23.

Macro types

Macros can be roughly divided into three roles:

  1. Direction control

    Within a direction control macro, program direction is coordinated. Smaller tasks may also be grouped. For instance, %Main() directs the control of the program and provides high level groupings of things like importing, data cleaning, and output generation.

    Direction controllers should appear above the macros which they call. The definitions of the macros contained in a controller should immediately follow the controller definition.

%macro %ImportData();
%put NOTE: [MACRO] Executing: ImportData();

  %ImportMortalityData();
  %ImportReproductionData();
  %ImportHistopathologyData();

%mend;

%macro ImportMortalityData();
%put NOTE: [MACRO] Executing: ImportMortalityData();

  data mortality;
    set inData.mortality;
  run;
%mend;

%macro ImportReproductionData();
%put NOTE: [MACRO] Executing: ImportReproductionData();

  data reproduction;
    set inData.reproduction_and_survival;
  run;
%mend;

%macro ImportHistopathologyData();
%put NOTE: [MACRO] Executing: ImportHistopathologyData();

  data histopathology;
    set inData.histopathology;
  run;
%mend;
  1. Descriptive wrapper

    All procedure/data steps should be enclosed in a macro. This provides self-documentation as well as aides in debugging24.

  2. Utilities

    Some tasks are so ubiquitous and repetitive as to warrant being separated from specific programs as stand-alone macros. These include tasks such as opening or closing Excel, establishing DDE links, and removing duplicate rows from data sets. For direction on how to best implement utilities, see autocall25.

MINOPERATOR

Toggle use of the MINOPERATOR in the macro declaration statement. Do not declare it using an OPTIONS statement26.

%macro Hello(greeting) / minoperator mindelimiter = ',';
  %if &greeting. in (Hi, Hello, Hey, Yo) %then %put Hello, world!;
%mend;

To use the minoperator with not, use the following form.

%if not ( &thingToCheck. in (item1, item2, item3) ) %then %do;

Log

Notes, Warnings, and Errors

A production program should never produce WARNING or ERROR statements. Similarly, NOTE statements regarding possible logical errors, such as truncation, should not occur. Allowing such entries in the log of a completed program negates the usefulness of such messages27.

Verbosity

Take efforts to limit output to only essential information. Only include messages which facilitates validation28.

Logical Operators

In numeric comparisons, use ^=, >=, <=. Avoid using the &, |, ~=, ne, ge, le, geq, leq, operators29.

Use not in instead of notin.

Titles and footnotes

Title and footnote statements should be cleared immediately after they are used30. Titles and footnotes should be cleared using a title; or footnote; statement, respectively.

Aliases

Do not use aliases. Write out the option name in full31.

Missing values

Give preference to the MISSING() function over logical operators32.

Escape Characters

Avoid defining escape characters whenever possible. Instead use (*ESC*) explicitly.

proc report data = sashelp.class
            style(header)=[
              background    = white
              rules         = none
              verticalalign = bottom
            ]
            spanrows
            SPLIT='00'x
            ;

  columns
    name
    sex
    age
    height
    weight
  ;

  define name   / 'First (*ESC*)n Name';
  define sex    / 'Sex';
  define age    / 'Age';
  define height / 'Height (*ESC*)n (Inches)';
  define weight / 'Weight (*ESC*)n (Lbs)';
run;

Concatenation

Use one of the CAT, CATS, CATX functions. Do not use COMPRESS or other functions which have a side effect providing the desired functionality. SAS Sample 24589 lists the concatenation functions and describes their behavior.

Sample 24589: Concatenation functions in SAS 9.0 and above

Illustrate the new CAT functions for joining text strings.

CAT  - concatenates character strings without removing leading or
      trailing blanks

CATS - concatenates character strings and removes leading
      and trailing blanks

CATT - concatenates character strings and removes trailing blanks

CATX - concatenates character strings, removes leading and
      trailing blanks, and inserts separators

In previous versions of SAS you would have to use a combination of the LEFT,
and/or the TRIM functions along with the double concatenation bars (||). If you
wanted a separator, you would have to include that inside quotes.

SQL Syntax

Align SQL statements by the comma33. Use one keyword per line (i.e. create, select, from, where etc.).

proc sql noprint;
  create table example as
  select
      F.id
    , F.item1
    , S.item2
  from
      first_data_set  F
    , second_data_set S
  where
          F.id = S.id
    AND S.item > 7
  order by
    F.id
  ;
quit;

Datasets

Data sets should be written in lowercase and use underscores to separate words34. Avoid using abbreviations unless absolutely necessary. Names should reflect the contents of the data set. It may be helpful to use past-tense.

Bad: MEAN2
Bad: CalcTreatMean
Good: treatment_means

File paths

File paths should not end in a slash35.

/*Good: No slash at the end.*/
%include 'C:\this\is\a\good\file\path';

/*Bad: Ends in a slash.*/
%include 'C:\this\is\a\bad\file\path\';

Output destinations

Suppress all unnecessary outputs, such as tables, plots, and irrelevant notes and warnings.

Each program should use the minimum number of output destinations36. If a program generates plots, the program should generate no plots within SAS and output figures directly to file when debug is FALSE. When debug is TRUE, all outputs should be restricted to temporary memory and nothing written permanently to file.

GOTO Statement

Don't use GOTO37.

Plot Size

Use the Golden Ratio when creating rectangular plots.

References

Honorable SAS Mentions

These resources aren't shining beacons, but provide value:

General Coding References

Footnotes:

1

The Single Responsibility Principle is an object-oriented concept regarding the functionality of classes. In essense, it states,

"A class should have only one reason to change."

SAS does not have classes. It can, however, be partitioned into separate units, each of which manages a single responsibility. Enclose each responsibility within a macro. As best as possible,

"A macro should have only one reason to change."

Doing so has the additional benefit of providing self-documenting code, meeting the requirements of clarify and ease of verification.

2

Although SAS used to recommend using all caps, this is no longer the standard. It appears that caps were used in the past because SAS is old and ALGOL, COBOL, BASIC, FORTRAN, etc all required caps. SAS does not require it. Using all lower-case allows for differentiation using Pascal or Camel-Case. It is also easier to read.

3

This mimics what is advised for PROC SQL. It allows for easy rearrangement and modification.

4

Defining a line limit has several advantages. It helps ensure that the user will not have to scroll horizontally, making the code easier to navigate and read. It is not guaranteed that all users use the same font size. A line limit also facilitates printing, a common task in validation. Lines exceeding 80 characters have a tendency to wrap, breaking the formatting/arrangement of the code. The choice of 80 is somewhat arbitrary, but as a decision must be made, 80 seems a good compromise between enough length to handle most coding situations yet short enough to accomodate different font sizes, and avoid wrapping when printing.

5

Writing a single semi-colon per line gives uniformity to the code, making it easier to debug. Other exceptions include such obvious situations as when using the %Skip utility or CALL EXECUTE.

6

Data set names are represented in the Explorer window in 'Propcase'. That is, a capitalized first letter with all subsequence letters in lowercase. Since data sets are most often accessed through the Explorer window, there is no advantage to using anything other than all lowercase. If one were to use camelCase or PascalCase, this would not be reflected in the the Explorer window. Using Propcase would be an unnecessary burden on the programmer.

The only way to separate words in a data set name is to use underscores. While this eats up a significant portion of the 32 character limit, using underscores to separate word allows data set names to be split. This may be of utility, as with variables.

Never overwrite a data set. Never. Quite often developers will continue to manipulate the same data set throughout their program. This makes verifying changes extremely difficult, if not impossible. It requires the person verifying to step through the program one line at a time, reading the code closely for syntax errors or mistakes in coding. The process becomes opaque and infernally vexing. Creating new data sets for each step helps make the process transparent. When this practice is adopted, the developer can design the data sets to be easily read and to correspond to one another. For example, if PERIMETER were to be calculated, a new dataset would be created containing the results. The new data set could also contain LENGTH and WIDTH to the left of the PERIMETER variable. In the case of aggregation/summary, such as calculating a mean or median, the data set being operated on should have the data clearly presented. For instance, ordered by group so that the median may be simply calculated by hand. A person verifying the code could then check that the calculation was correctly performed, even if that person has no knowledge of how SAS works.

In practice, creating a new data at each step presents some challenges to be aware of. For example, in calculating a mean using PROC MEANS, the output data set will not be in proper order. It must be sorted by PROC SORT afterward. Finding descriptive names for each data set in this sequence is a challenge. Often times these awkward circumstances can be avoided altogether using PROC SQL and the 'order by' command. While this may technically violate the Single Responsibility Principle, this practice can be justified in recognizing that leaving data in order should be a standard functionality to begin with.

7

This is motivated primarily through ease of output. Separating words by underscores allows for easy splitting. For example,

proc print
  heading=horizontal
  split = '_'
  data = &dataset.
  ;
run;

An argument could be made for using camelCase or PascalCase and assigning labels. However, since it is not always clear which data sets will need to be output, one would be required to include labels for all variables. Labels are not always preserved when a dataset is exported and imported. Labels are not forward facing on all interfaces. The use of underscores avoids fiddling with labels.

8

Using camelCase helps distinguish between other types of objects. In the author's opinion, it also makes the variable easier to read on account of the ampersand prefix.

/*Bad: The underscore delimits the word 'number' and gives the
        impression that the variable is just 'number'.*/
%put There are &number_of_observations observations.;

/*Good: The ampersand appears more at home in the humps and
        bumps of the camel.*/
%put There are &numberOfObservations observations.;
9

This is chosen somewhat arbitrarily. Practical considerations include:

  1. Given the 32 character limit, the use of underscores would require a significant motivation. The author could think of none.
  2. SAS stores macros using ALLCAPS. However, this is difficult to read. If ALLCAPS were used, it would make sense to use underscores, but this would eat in to the 32 character limit.
  3. The above imply that camelCase or PascalCase should be used.
  4. Macros do not have a clear analogy in other languages. They are not exactly functions. They are not classes. Perhaps their closest counterpart is a C macro. However, the convention used for naming C macros is ALLCAPS with underscores.
10

Describing the 'what' allows the 'how' to change in the future. It may not always be that the macro will perform its function the same way. A better approach may be discovered later or other parts of the program may change that affect how that particular step must be performed.

11

There are several reasons for this:

  1. It is not required.
  2. It clutters the screen with unnecessary code.
  3. It makes operation of keyboard macros either more difficult or impossible to code for.

If following the Single Responsibility Principle, seeing the entire macro definition is often possible. A by-product of the Single Responsibility Principle is that each macro is a simple logical chunk. It is easy to keep this in mind when working on a macro.

12

Or better yet, use a versioning control system.

13

SAS has 4 kinds of comments, each with their own quirks. Comparing SAS to contemporary langauges, the PL/I style commments are the least worst choice.

Using a single pair of comment braces per comment makes reading and editing comments easier than commenting each line individually. If each line is commented, then the end bracket must be realigned any time an edit is made. It may be possible to create a keyboard macro to perform such realignments. However, using only two braces proves sufficient in practice.

Limiting comment line length makes comments more legible. Since the line character limit is 80 characters, restricting comments to roughly 60 characters gives the appearance of fitting 'nicely' inside the program.

The 4 SAS Comment Types:

  1. PL/I Style comments

    /*This is a PL/I style comment.*/
    

    PL/I style comments have symmetry which no other style has. SAS provides a native macro keyboard command to insert PL/I style comments. They can be used to comment out semi-colons and quotes. They also have the unique ability to comment out code mid-line. However, nested PL/I comments are not supported; mid-line comments cause errors when they themselves are commented out.

    /*Notice the last bracket is not highlighted*/
    data test;
      /*      x = 1; /* test */ */
    run;
    

    Whether commenting supports nesting or not has historical origins. C, for instance, does not support nested comments. Such behavior simplifies the parser. Since SAS is written in C, it seems natural that comments would be treated similarly.

  2. Inline comments

    * This is an inline comment;
    

    Inline comments are asymmetric and cannot be used mid-line. They cannot comment out semi-colons.

    data _null_;
      length first  $ 3. * won't run b/c of semi-colon ;
      two    $ 3.
      ;
    run;
    
  3. Macro comments

    %*This is a macro comment;
    

    Macro comments are parsed/tokened differently than the other styles within the macro processor. This appears to be their only reason for existing.

  4. comment comments

    comment this is a comment;
    

    The word 'comment' is a token to initiate a comment. It's not clear why this exists or what utility it has. It appears to be a left over piece of history from when ALGOL roamed the earth.

For completeness, it should be noted that there are differences in the way various comments are parsed. Usage Note 32684 gives details and recommends using PL/I style comments.

SAS also recommends starting PL/I style comments in column 3 to avoid conflicts with 'some operating environments'. Apparently SAS might interpret a /* as a request to end the SAS program or session. However, as Windows is not such an operating environment, this recommendation is ignored.

14

For example, if computing a PERIMETER, the resulting dataset should contain fields for LENGTH and WIDTH, preferablly arranged in the order: LENGTH, WIDTH, PERIMETER. This allows for easy manual checks of data. It is not guaranteed that the person doing quality checks on the program knows the language well enough to confirm that the code functions as intended. Including data relevant to the calculation provides a means for validation which would be impossible otherwise.

Limiting a dataset to only the necessary variables avoids clutter.

15

The KEEP data step option manages incoming data whereas the KEEP statement manages outcoming data. The KEEP data step option requires parentheses. This syntax is difficult to align and organize. organize. It makes indentation awkward and compromises readability. For this reason, its use should be avoided. Since the KEEP option manages data in a fundamentally different way than the KEEP statement, avoiding the KEEP option may not always be possible. However, if the data flow is managed carefully, it is often possible to use only KEEP statements.

16

This allows for an exported data set to be verified without danger of overwriting previous versions, as well as avoiding the need to create separate code expressly for the purpose of validating. Upon execution of a program, all aspects should be available to the user.

17

Implicit, or side-effect, conversions convert the data type as a side-effect of the primary function. This often leaves a note in the log indicating a conversion was performed. If all conversions are handled in the code explicitly then any implicit conversions are unintended and are errors. Otherwise, the notes need to be investigated.

18

Truncation introduces ambiguity which casts doubt on the trustworthiness of the log making validation difficult.

19

Using uninitialized variables introduces ambiguity. If all variables are initilized, then an uninitialized variable indicates an error. When uninitilized variables are permitted, then each occurange, intended or otherwise, must be investigated during validation.

20

Typically, this will be something like turning off an ODS destination and turning it back on. In the case of developing utilities, it may not be known ahead of time what various options are set at. Save the current settings for the options that will be changed and restore them after the new setting are no longer needed.

  1. Using GETOPTION to Save and Restore Options
  2. SAS System Options

It is advised to avoid PROC OPTSAVE and PROC OPTLOAD whenever possible as they do not execute quickly.

21

Syntax is not highlighted within a string. Using double quotes indicates that the string contains variables or functions which may be a source of errors. This convention indicates that a double-quote string contains atypical string data.

22

Including the macro name in the %mend; statement introduces clutter. It also complicates the key-board macros used for navigation. When adering to the Single Responsibility Principle, most macros will be short. Therefore, the context of the macro should be evident.

23

The SAS 9.2 Documentation states,

"If you use the same code repeatedly, it might be more efficient to use a macro because a macro is compiled only once during a SAS job, no matter how many times it is called."

It also states,

"So, be sure to use a macro only when necessary."

Most people appear to interpret these sorts of statements as 'use macros sparingly.' When viewed from the perspective of readability, ease in debugging, and single responsibility, macros become a necessity.

When wrapping each step of a program in a macro, only the parts of the program which have changed need to be recompiled. It can therefore be faster to wrap each step than not to. Setting keyboard macros to automate the macro definitions facilitates the development process. The penalty appears in the compilation of all the macros in a program upon submit. If this penalty proves too great, either the wrapper approach could be abandoned or a solution developed in which a stored compiled macro is used (with the source code saved within).

24

A common misconception is that macros are difficult to debug. When the guidelines described here are followed, macros are actually easier to debug than plain code. Each macro contains a %put NOTE: Executing... statement, each macro performs a single task, and all macros must have unique names. This means the location of most errors can be quickly isolated. Navigating to any particular step can be reached through the various hotkeys or directly by a "Find" prompt. Since each step is named, there is no ambiguity should the program contain several instances of the same proc. Highlighting can be easily toggled using %if 0 %then %nrstr(%mend);, especially when bound to an abbreviation. When a macro is finalized, removing the highlight statement serves to indicate that it is complete.

25

Do not store all your utilites in a single file called 'functions.sas'. Aside from being inefficient and difficult to maintain, it is not transparent whether or not conflicts may arise.

26

To use the IN operator within a macro, the MINOPERATOR option must be set. This can be done globally via an OPTIONS statement or within the macro definition statement. The latter is preferable. First, prudence dictates setting it specifically on the macros which require it. This avoids any unforeseen conflicts in other parts of the program. It detaches the option from the context in which it is relevant. Second, as a practical matter, it appears that using the OPTIONS statement does not allow for the delimiter to be defined in the same statement. The delimiter must instead be defined in the macro definition statement. This causes the MINOPERATOR relevant code to be dispersed. Such unnecessary splitting of code lessens clarity of code, hampers verification and modification, as well as provides clutter (using two statements instead of one).

27

It may be necessary in some situations to suppress all notes. An example is when SAS performs convergence tests. If the data is static and convergence has been established during development, suppress the notes for that section only. Leave a developer note indicating that the default notes have been disabled. This way, the log does not get cluttered, but a record still exists. Someone checking specifically for convergence would know to enable notes for that segment.

Considering notes not to be errors is endemic in the SAS community. This is a naive and dangerous oversight. As Statisticians and Data Analysts, the whole of our conclusions rests on the integrity of its underlying data. Confidence in our results requires complete confidence in our data. SAS notes are dangerous because they are silent. An error may terminate program execution. A note certainly does not. Yet SAS delgates potential menaces like truncation and type conversion to passing mentions. Saying, "It works most of the time" is another way of saying, "It happens." Just because SAS doesn't buckle our thought-child into a car seat for us doesn't mean we shouldn't execrise such basic precautions ourselves.

28

SAS outputs are notoriously verbose. Try to avoid writing more than a handful of lines to the log as a matter of courtesy. No one wants to read through 5,000 lines of output when doing QA. It also defeats the effectiveness of the log to indicate how the code is functioning. Overwhelming the user is as counter-productive as providing them with too little information.

Many procedures have a NOPRINT option. It may sometimes make sense to restrict output using NOPRINT but to then include a manual note using a %put or put statement. It is suggested that such notes include an indicator to separate it from notes generated by SAS.

%put NOTE: [DEV] This is a note from the developer.
%put NOTE: [MACRO] This note originated from within a macro.
29

Ampersands as logical operators is avoided to prevent any readability issues with macro variables. A macro variable resolution uses at least one ampersand (e.g. &macroVar. or &&&macroVar.). Within PROC SQL the and keyword is highlighted which also improves readability. Pipe is avoided to be consistent with writing out 'and'.

The numeric comparison operators chosen most closely resemble those of other contemporary programming languages.

30

Title definition n erases all title definitions greater than n. Clearing immediately ensures that titles are as intended. Using a title '' statement may set the title to a single quote. SAS advises using what is recomended here (source needed).

31

An alias is an abbreviated call for an option. For instance, using C= instead of COLOR=. Most of the time, these are consistent across statements . However, they are not always. For instance, within a SYMBOL statement, R= is an alias for REPEAT whereas for an AXIS statement, it is an alias for ROTATE. Searching for an alias within documentation can be extremely difficult. Furthermore, using aliases also obscures the meaning of the code. It is better to simply write everything out.

32

The goal is to make the code more readable. This should be exercised with caution as the MISSING function is more general than a specific logical comparison. For instance, a blank character value passed into the MISSING function would register true. However, if the value were supposed to be numeric, this would pass unnoticed. An explicit comparison to the missing numeric value . would throw an error. The author is still undecided on this matter.

33

This allows for easy rearrangement, removal, and adding of lines.

34

Within the explorer window, datasets have a capitalized first letter with all remaining letters lowercase. Therefore, camelCase or PascalCase cannot be used to improve readability within the explorer. To distinguish words, an underscore must be used.

35

File paths need a slash before referring to a specific file. That slash can be included in the file-path or in the file reference. Ultimately, the decision is arbitrary. However, a convention should be adopted to avoid conflicts.

When file paths are copied using Ctrl+RMB in Windows, the path does not end in a slash. Nor does a file path copied from Windows Explorer contain an ending slash. This seems justification enough for requiring ending slashes be hard-coded into references and not file-paths.

36

SAS is notoriously verbose by default. Restrict all output to the essentials. This can be done through use of NOPRINT and closing various ODS destinations. Restricting output not only makes it easier on the user, but it often dramatically improves performance.

37

Go to statement considered harmful. Because SAS does not have a try-catch construct, it may be necessary to use GOTO in this capacity. However, avoid its usage if possible.

2018-10-17

Powered by peut-publier

©2020 Matt Trzcinski