Dataframe selection by numeric columns in R

Question

This is very easy to do in python but is tripping me up in R.

numeric_cols<-data_all %>% select_if(is.numeric)
columns <-colnames(numeric_cols)
data_all[colnames] # returns dataframe selection

data_all[which(rowSums(data_all[colnames]) > 300),]

Giving the error:

Warning message in cbind(parts$left, ellip_h, parts$right, deparse.level = 0L):
“number of rows of result is not a multiple of vector length (arg 2)


rowSums(data_wideALL[colnames] > 300)

Returns

<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>

How do I approach this in R

In the example, you use a dataframe called `data_all`. The warning suggests that you used a dataframe named `data_wideALL` instead. Do you need `data_wideALL[rowSums(data_wideALL[colnames]) > 300),]` ? (not need for `which` here) — markus, Nov 16 '20 at 16:14
In order for us to help you, please provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). For example, to produce a minimal data set, you can use `head()`, `subset()`, or the indices. Then use `dput()` to give us something that can be put in R immediately. Also, please make sure you know what to do [when someone answers your question](https://stackoverflow.com/help/someone-answers). More info can be found at Stack Overflow's [help center](https://stackoverflow.com/help). Thank you! — iamericfletcher, Nov 16 '20 at 16:23

score 0 · Accepted Answer · answered Nov 16 '20 at 16:31

0

try this:

numeric_cols <- data_all %>%
    select_if(is.numeric)

num_cols <- names(numeric_cols)

data_all <- data_all %>%
    select(num_cols) 

data_all$row_sum <- rowSums(data_all)

data_all <- data_all %>%
    filter(row_sum > 300)

answered Nov 16 '20 at 16:31

Randall Helms

819
5
15

score 0 · Answer 2 · answered Nov 17 '20 at 07:20

It is a bit difficult to answer your question without knowing the exact question requirements and reproducible code. Is this what you after ?

numeric_cols<-data_all %>% select_if(is.numeric)
columns <-colnames(numeric_cols)

data_all<-data_all[columns] # returns dataframe selection

data_all[rowSums(data_all[columns] > 300),]

score 0 · Answer 3 · answered Nov 17 '20 at 07:52

You can use sapply with is.numeric like this in base R:

# assign a data set
dat <- data.frame(A = c(1L, 2L, 3L), B = c(TRUE, TRUE, FALSE), 
                  C = c(1, 2, 3), D = c(50, 350, 700))

# use sapply + is.numeric
dat[sapply(dat, is.numeric)]
#R>   A C   D
#R> 1 1 1  50
#R> 2 2 2 350
#R> 3 3 3 700

Then you can do something like this if you only want the rows which has a sum which is greater than 300:

dat[rowSums(dat[sapply(dat, is.numeric)]) > 300, ]
#R>   A     B C   D
#R> 2 2  TRUE 2 350
#R> 3 3 FALSE 3 700

A solution without the non-numeric columns is:

dat <- dat[sapply(dat, is.numeric)]
dat[rowSums(dat) > 300, ]
#R>   A C   D
#R> 2 2 2 350
#R> 3 3 3 700

Dataframe selection by numeric columns in R

3 Answers3