1

I have a data frame and I want to remove all columns with less than 1000 observations. The approach below works fine, but is there any more elegant solution?

vec <- numeric()

for(i in 1:ncol(dat))
{
    if(length(dat[,i][!is.na(dat[,i])]) >= 1000) 
        vec <- c(vec, i)
}

dat <- dat[,vec]
Roman Luštrik
  • 67,056
  • 24
  • 151
  • 191
vitor
  • 1,228
  • 2
  • 13
  • 24
  • 1
    Please add reproducible sample for good people here to help you. See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example What is structure `dat`? Can you paste output of `dput(dat)` here – CHP Mar 18 '13 at 17:33
  • This is a pretty broad question that applies to all `data.frame`s so I don't know what you'd expect to learn by seeing the object. `dput(head(dat))` would be a much better idea btw, since he's talking about 1000s of observations. – Señor O Mar 18 '13 at 18:56

1 Answers1

7

This should work:

dat[,colSums(!is.na(dat))>=1000]

Here we first look which elements in dat are no NA, and compute columns sums of this new data frame. For those columns which contain at least 1000 observations we get TRUE and for others FALSE. So we can use it as an index variable which subsets original dat data frame.

Jouni Helske
  • 6,337
  • 28
  • 52