14

I am just picking up Excel, but have experience with R and Stata. Does Excel have a missing symbol convention? That is, does Excel have something like R's NA or Stata's .?

For example, I evaluate an IF() statement and want to return a value that will be omitted from later calculations I use "NA". Is this the correct approach in Excel? It seems like I am missing a basic concept, but I can't get Google to give me a better answer.

To make this more concrete, I have a conditional like =IF([@[Div Dummy]]=1,"NA",EOMONTH(A8,0)). Is there a way to get a numeric "missing value" placeholder so that I don't get warnings about conflicted data types?

Richard Herron
  • 1,137
  • 2
  • 19
  • 44
  • 1
    I think the closest thing is using an empty string... but perhaps someone will surprise me. – Daniel Oct 19 '12 at 16:08
  • To expound on the OP's question: consider a column of data with several 0's, where the zeros are actually missing. You want an average of the non-zero data. How to go about this? So far all my efforts have resulted in generating #NAME? and similar errors. – ckg Apr 28 '16 at 14:55

3 Answers3

8

You're almost there. You can use NA() in your formulas to return that "error" value. It's very useful when charting because it's not charted (unlike 0).

dav
  • 10,293
3

You can leave cells empty and use (if I remember correctly) =isblank() to do the conditional. And it's worth noting that many numerical functions like =sum() and =product() will ignore empty cells, ala sum(x, na.rm=TRUE) in R.

But no, this is one of dozens of reasons why Excel is not a serious tool for statistical data. (Although it's not terrible for simple simulation and optimization problems.)

Harlan
  • 131
  • 3
  • Let me see if I understand you correctly. You recognize that Excel is absolutely the wrong tool for teaching statistics, or at least that you must be very careful trying to use Excel to do statistics. And you are teaching students who perhaps are less sophisticated than some. So how is it exactly that you expect your less sophisticated students to be able to discern the subtleties of Excel's suitability? Use the right tool with these students or don't bother; you can only do harm. – ckg Apr 28 '16 at 14:52
0

I find array formulas in Excel very useful for solving a lot of these kinds of problems. If you don't know array formulas yet, here's a example:

Data
       Col A            Col B            Col C
Row 1: Fruit            Number           Color 
Row 2: Apple            5                Green
Row 3: (leave A3 blank) 10               Yellow
Row 4: Peach            6                (leave C4 blank)    
Row 5: Grape            6                Purple

Then, enter this in another cell, somewhere:

=AVERAGE(IF((A2:A5<>"")*(C2:C5<>""),B2:B5))

To make it an array, you MUST use Ctrl+Shift+Enter.  Your formula will look like this:

{=AVERAGE(IF((A2:A5<>"")*(C2:C5<>""),B2:B5))}

but do not type the curly braces; they will appear automatically when you use Ctrl+Shift+Enter.

In that formula "" stands for missing, <> means not equal to, and the * means AND. So, that formula says, If A2:A5 is not missing and C2:C5 is not missing, then return the average of B2:B5. In this case, the result is 5 (average of 4 and 6). 5 and 10 are excluded because values are missing in A3 and C4.

Now try this:

=AVERAGE(IF((A2:A5<>"")+(C2:C5<>""),B2:B5))

Enter as an array.

The only difference is + instead of *. + means OR. So, in this case the answer is 6.25, because all of the values in B have a non-missing value in A or C.

AVERAGE can be replaced by a number of other functions such as MAX, MIN, SUM, COUNT, and you can put conditionals in front of the array: e.g.,

=IF(B2>3,AVERAGE(IF((A2:A5<>"")+(C2:C5<>""),B2:B5)),"N/A")

Enter as an array formula. So, if some value gives permission, then run the array, otherwise return "N/A". The permission value can be anywhere else on your spreadsheet and does not have to be contained in an array.

You can also filter by values within the target column: for example:

=AVERAGE(IF(((A2:A5<>"")+(C2:C5<>""))*(B2:B5>4),B2:B5))

so, if (A2:A5 is not missing OR C2:C5 is not missing) AND B2:B5 is greater than 4, the answer is 7 because it will take the average of 5, 6, and 10.

You can also do this...

=AVERAGE(IF((A2:A5="Apple")+(A2:A5="Pear"),B2:B5))

(note that "Apple" and "Pear" can be replaced with cell references).  The answer is 4.5 because it will average 4 and 5.

or this: First enter =10/0 in B3. You will get a divide by zero error that will replace the value 10.

=SUM(IF((ISNUMBER(B2:B5)),B2:B5))

Enter as array.

So, this looks across the array B2:B5 and will only include numbers. Since #DIV/0 is not a number, it will sum 4, 5, 6 for an answer of 15.

I'm sure there are limits somewhere on the number of conditionals within an array but I've never run into a situation where I ran out. As well, this is such a flexible tool, only really limited by your creativity. Lot's of interesting things you can do with embedded MAX and MIN functions within an array formula :)

Be aware though that like other kinds of formulas in Excel, array formulas use resources and can slow your spreadsheet.

I find these particularly useful for creating summary reporting in tables that can be highly customized. Because these formulas can be dragged, if you are careful with your absolute references ($), you can very quickly create a large number of these.

Daniel
  • 1