Missing Values

1 post / 0 new
Missing Values
All of the statistical packages handle missing values in different ways, and Sanda is beginning to find differences in results for some data transformation functions.  
 
One of the most basic aspects of the problem is that SPSS, SAS, and Stata treat missing values in logical expressions differently.  SPSS considers a logical expression with a missing value as missing (i.e. not TRUE or FALSE).  However, Stata treats a missing value as infinity, and SAS treats a missing value as negative infinity.  Consequently, a logical expression with a missing value can be TRUE in Stata and SAS.   
 
Here is an example:
1) Enter a simple data set with one variable X taking the values 2, 3, 4, and -1
2) Set -1 to missing.
3) Compute two new variables
if X>3 then Y=9
if X<3 then Z=8
 
I ran this little program in SPSS, Stata, and SAS, and they produce different results for Y and Z. 
The problem is that Stata and SAS both return TRUE for comparisons involving missing values, but in opposite ways.
  • SPSS
    • Logical expressions including a missing value are considered “Missing.”  Usually, “Missing” is equivalent to “False.”
  • Stata
    • Missing values are treated as numbers equal to infinity.  So, any number is less than a missing value.
  • SAS
    • Missing values are treated as numbers equal to minus infinity.  So, any number is greater than a missing value.
 

Input Data

Output Data

SPSS

MISSING VALUES X(-1).
IF  (X > 3) Y=9.
IF  (X < 3) Z=8.

X

X

Y

Z

2

2

 

8

3

3

   

4

4

9

 

-1

-1

   

Stata

replace X=. if X==-1
generate Y=9 if X>3
generate Z=8 if X<3

X

X

Y

Z

2

2

 

8

3

3

   

4

4

9

 

-1

 

9

 

SAS

if X=-1 then X=.;
if X>3 then Y=9;
if X<3 then Z=8;

X

X

Y

Z

2

2

.

8

3

3

.

.

4

4

9

.

-1

.

.