Deleting columns of a data frame with less than 1000 observations

Question

I have a data frame and I want to remove all columns with less than 1000 observations. The approach below works fine, but is there any more elegant solution?

vec <- numeric()

for(i in 1:ncol(dat))
{
    if(length(dat[,i][!is.na(dat[,i])]) >= 1000) 
        vec <- c(vec, i)
}

dat <- dat[,vec]

Please add reproducible sample for good people here to help you. See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example What is structure `dat`? Can you paste output of `dput(dat)` here — CHP, Mar 18 '13 at 17:33
This is a pretty broad question that applies to all `data.frame`s so I don't know what you'd expect to learn by seeing the object. `dput(head(dat))` would be a much better idea btw, since he's talking about 1000s of observations. — Señor O, Mar 18 '13 at 18:56

score 7 · Accepted Answer · answered Mar 18 '13 at 17:34

7

This should work:

dat[,colSums(!is.na(dat))>=1000]

Here we first look which elements in dat are no NA, and compute columns sums of this new data frame. For those columns which contain at least 1000 observations we get TRUE and for others FALSE. So we can use it as an index variable which subsets original dat data frame.

answered Mar 18 '13 at 17:34

Jouni Helske

6,337
28
52

1

while most logical answer in absence of data, we don't know yet what's structure of `dat`. Ah well +1 anyway – CHP Mar 18 '13 at 17:35
Yes, that's it! Thank you very much! – vitor Mar 18 '13 at 17:36
You can change `>` for `>=` to get exactly the same the OP wants. – Ferdinand.kraft Mar 18 '13 at 17:36
@geektrader He did say data frame and that his code works, so I though this should work always in those limits. But yeah, it would be nice to have somekind of note in Ask Question form about "dputting" your data.. – Jouni Helske Mar 18 '13 at 17:43
@aguiar, why don't you accept the answer (to this one and your other questions)? – Arun Mar 18 '13 at 17:45
@Arun I can only accept it after 10 minutes. It's accepted now. – vitor Mar 18 '13 at 18:14
@aguiar, yes, forgot about that. Also consider accepting to your other questions. – Arun Mar 18 '13 at 18:18

Deleting columns of a data frame with less than 1000 observations

1 Answers1