How can I get the average (mean) of selected columns

Question

I would like to get the average for certain columns for each row.

I have this data:

w=c(5,6,7,8)
x=c(1,2,3,4)
y=c(1,2,3)
length(y)=4
z=data.frame(w,x,y)

Which returns:

I would like to get the mean for certain columns, not all of them. My problem is that there are a lot of NAs in my data. So if I wanted the mean of x and y, this is what I would like to get back:

  w x  y mean
1 5 1  1    1
2 6 2  2    2
3 7 3  3    3
4 8 4 NA    4

I guess I could do something like z$mean=(z$x+z$y)/2 but the last row for y is NA so obviously I do not want the NA to be calculated and I should not be dividing by two. I tried cumsum but that returns NAs when there is a single NA in that row. I guess I am looking for something that will add the selected columns, ignore the NAs, get the number of selected columns that do not have NAs and divide by that number. I tried ??mean and ??average and am completely stumped.

ETA: Is there also a way I can add a weight to a specific column?

score 53 · Accepted Answer · edited Feb 26 '18 at 18:36

53

Here are some examples:

> z$mean <- rowMeans(subset(z, select = c(x, y)), na.rm = TRUE)
> z
  w x  y mean
1 5 1  1    1
2 6 2  2    2
3 7 3  3    3
4 8 4 NA    4

weighted mean

> z$y <- rev(z$y)
> z
  w x  y mean
1 5 1 NA    1
2 6 2  3    2
3 7 3  2    3
4 8 4  1    4
> 
> weight <- c(1, 2) # x * 1/3 + y * 2/3
> z$wmean <- apply(subset(z, select = c(x, y)), 1, function(d) weighted.mean(d, weight, na.rm = TRUE))
> z
  w x  y mean    wmean
1 5 1 NA    1 1.000000
2 6 2  3    2 2.666667
3 7 3  2    3 2.333333
4 8 4  1    4 2.000000

edited Feb 26 '18 at 18:36

eli-k

9,991
11
41
43

answered Feb 28 '12 at 22:20

kohske

63,256
8
159
152

1

Thanks, this does exactly what I am looking for. I really need to study up on this magical apply command, it seems like it is a solution to everything. – thequerist Feb 28 '12 at 22:30

score 26 · Answer 2 · answered Feb 28 '12 at 22:20

26

Try using rowMeans:

z$mean=rowMeans(z[,c("x", "y")], na.rm=TRUE)

  w x  y mean
1 5 1  1    1
2 6 2  2    2
3 7 3  3    3
4 8 4 NA    4

answered Feb 28 '12 at 22:20

Andrew

33,526
12
62
92

+1 Thanks, I normally use Extract, cannot believe I did not think of this. Gave the check to kohske for including solution to weighted also. – thequerist Feb 28 '12 at 22:33
@andrew How can I get rowMeans of a range of column with some particular name (e.g. MGW.1, MGW.2, MGW.3 ... MGW.198) so all of these columns have name starting with MGW but then number is different and I am not sure about exact number of such columns they could be 196 in one case and 198 in other. What I want is something like: `data.frame(ID=DF[,1:4], MGW=rowMeans(DF[,MGW.*]), HEL=rowMeans(DF[,HEL.*]))` It means the first 4 columns should not be touched while for rest of the columns take average of all columns in category MGW.* and HEL.* – Newbie Jul 27 '16 at 14:42
For that, I'd recommend using dplyr and tidyr, since dplyr lets you select columns based on regular expressions. See [this question](https://stackoverflow.com/questions/33401788/dplyr-using-mutate-like-rowmeans) for an example, or [this gist](https://gist.github.com/andrewheiss/dcd38268c65396e1f18de626e70cae47) for a worked example of this question. – Andrew Jul 28 '16 at 18:24
@Andrew Hi, I am going to find the mean of every 10 columns of my data (which has 1000 columns) how should I modify it?Can you please guide me?Thanks :) – Shalen May 07 '20 at 16:15

How can I get the average (mean) of selected columns

2 Answers2

Linked

Related