34

I would like to get the average for certain columns for each row.

I have this data:

w=c(5,6,7,8)
x=c(1,2,3,4)
y=c(1,2,3)
length(y)=4
z=data.frame(w,x,y)

Which returns:

  w x  y
1 5 1  1
2 6 2  2
3 7 3  3
4 8 4 NA

I would like to get the mean for certain columns, not all of them. My problem is that there are a lot of NAs in my data. So if I wanted the mean of x and y, this is what I would like to get back:

  w x  y mean
1 5 1  1    1
2 6 2  2    2
3 7 3  3    3
4 8 4 NA    4

I guess I could do something like z$mean=(z$x+z$y)/2 but the last row for y is NA so obviously I do not want the NA to be calculated and I should not be dividing by two. I tried cumsum but that returns NAs when there is a single NA in that row. I guess I am looking for something that will add the selected columns, ignore the NAs, get the number of selected columns that do not have NAs and divide by that number. I tried ??mean and ??average and am completely stumped.

ETA: Is there also a way I can add a weight to a specific column?

eli-k
  • 9,991
  • 11
  • 41
  • 43
thequerist
  • 1,674
  • 3
  • 16
  • 27

2 Answers2

53

Here are some examples:

> z$mean <- rowMeans(subset(z, select = c(x, y)), na.rm = TRUE)
> z
  w x  y mean
1 5 1  1    1
2 6 2  2    2
3 7 3  3    3
4 8 4 NA    4

weighted mean

> z$y <- rev(z$y)
> z
  w x  y mean
1 5 1 NA    1
2 6 2  3    2
3 7 3  2    3
4 8 4  1    4
> 
> weight <- c(1, 2) # x * 1/3 + y * 2/3
> z$wmean <- apply(subset(z, select = c(x, y)), 1, function(d) weighted.mean(d, weight, na.rm = TRUE))
> z
  w x  y mean    wmean
1 5 1 NA    1 1.000000
2 6 2  3    2 2.666667
3 7 3  2    3 2.333333
4 8 4  1    4 2.000000
eli-k
  • 9,991
  • 11
  • 41
  • 43
kohske
  • 63,256
  • 8
  • 159
  • 152
  • 1
    Thanks, this does exactly what I am looking for. I really need to study up on this magical apply command, it seems like it is a solution to everything. – thequerist Feb 28 '12 at 22:30
26

Try using rowMeans:

z$mean=rowMeans(z[,c("x", "y")], na.rm=TRUE)

  w x  y mean
1 5 1  1    1
2 6 2  2    2
3 7 3  3    3
4 8 4 NA    4
Andrew
  • 33,526
  • 12
  • 62
  • 92
  • +1 Thanks, I normally use Extract, cannot believe I did not think of this. Gave the check to kohske for including solution to weighted also. – thequerist Feb 28 '12 at 22:33
  • @andrew How can I get rowMeans of a range of column with some particular name (e.g. MGW.1, MGW.2, MGW.3 ... MGW.198) so all of these columns have name starting with MGW but then number is different and I am not sure about exact number of such columns they could be 196 in one case and 198 in other. What I want is something like: `data.frame(ID=DF[,1:4], MGW=rowMeans(DF[,MGW.*]), HEL=rowMeans(DF[,HEL.*]))` It means the first 4 columns should not be touched while for rest of the columns take average of all columns in category MGW.* and HEL.* – Newbie Jul 27 '16 at 14:42
  • For that, I'd recommend using dplyr and tidyr, since dplyr lets you select columns based on regular expressions. See [this question](https://stackoverflow.com/questions/33401788/dplyr-using-mutate-like-rowmeans) for an example, or [this gist](https://gist.github.com/andrewheiss/dcd38268c65396e1f18de626e70cae47) for a worked example of this question. – Andrew Jul 28 '16 at 18:24
  • @Andrew Hi, I am going to find the mean of every 10 columns of my data (which has 1000 columns) how should I modify it?Can you please guide me?Thanks :) – Shalen May 07 '20 at 16:15