385

I have a data frame. Let's call him bob:

> head(bob)
                 phenotype                         exclusion
GSM399350 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399351 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399352 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399353 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399354 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399355 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-

I'd like to concatenate the rows of this data frame (this will be another question). But look:

> class(bob$phenotype)
[1] "factor"

Bob's columns are factors. So, for example:

> as.character(head(bob))
[1] "c(3, 3, 3, 6, 6, 6)"       "c(3, 3, 3, 3, 3, 3)"      
[3] "c(29, 29, 29, 30, 30, 30)"

I don't begin to understand this, but I guess these are indices into the levels of the factors of the columns (of the court of king caractacus) of bob? Not what I need.

Strangely I can go through the columns of bob by hand, and do

bob$phenotype <- as.character(bob$phenotype)

which works fine. And, after some typing, I can get a data.frame whose columns are characters rather than factors. So my question is: how can I do this automatically? How do I convert a data.frame with factor columns into a data.frame with character columns without having to manually go through each column?

Bonus question: why does the manual approach work?

GSee
  • 47,140
  • 13
  • 121
  • 142
Mike Dewar
  • 10,601
  • 13
  • 48
  • 63

18 Answers18

384

Just following on Matt and Dirk. If you want to recreate your existing data frame without changing the global option, you can recreate it with an apply statement:

bob <- data.frame(lapply(bob, as.character), stringsAsFactors=FALSE)

This will convert all variables to class "character", if you want to only convert factors, see Marek's solution below.

As @hadley points out, the following is more concise.

bob[] <- lapply(bob, as.character)

In both cases, lapply outputs a list; however, owing to the magical properties of R, the use of [] in the second case keeps the data.frame class of the bob object, thereby eliminating the need to convert back to a data.frame using as.data.frame with the argument stringsAsFactors = FALSE.

Community
  • 1
  • 1
Shane
  • 95,736
  • 34
  • 221
  • 217
335

To replace only factors:

i <- sapply(bob, is.factor)
bob[i] <- lapply(bob[i], as.character)

In package dplyr in version 0.5.0 new function mutate_if was introduced:

library(dplyr)
bob %>% mutate_if(is.factor, as.character) -> bob

...and in version 1.0.0 was replaced by across:

library(dplyr)
bob %>% mutate(across(where(is.factor), as.character)) -> bob

Package purrr from RStudio gives another alternative:

library(purrr)
bob %>% modify_if(is.factor, as.character) -> bob
Marek
  • 47,613
  • 13
  • 96
  • 119
41

The global option

stringsAsFactors: The default setting for arguments of data.frame and read.table.

may be something you want to set to FALSE in your startup files (e.g. ~/.Rprofile). Please see help(options).

micstr
  • 4,613
  • 6
  • 43
  • 71
Dirk Eddelbuettel
  • 347,098
  • 55
  • 623
  • 708
25

If you understand how factors are stored, you can avoid using apply-based functions to accomplish this. Which isn't at all to imply that the apply solutions don't work well.

Factors are structured as numeric indices tied to a list of 'levels'. This can be seen if you convert a factor to numeric. So:

> fact <- as.factor(c("a","b","a","d")
> fact
[1] a b a d
Levels: a b d

> as.numeric(fact)
[1] 1 2 1 3

The numbers returned in the last line correspond to the levels of the factor.

> levels(fact)
[1] "a" "b" "d"

Notice that levels() returns an array of characters. You can use this fact to easily and compactly convert factors to strings or numerics like this:

> fact_character <- levels(fact)[as.numeric(fact)]
> fact_character
[1] "a" "b" "a" "d"

This also works for numeric values, provided you wrap your expression in as.numeric().

> num_fact <- factor(c(1,2,3,6,5,4))
> num_fact
[1] 1 2 3 6 5 4
Levels: 1 2 3 4 5 6
> num_num <- as.numeric(levels(num_fact)[as.numeric(num_fact)])
> num_num
[1] 1 2 3 6 5 4
De Novo
  • 6,260
  • 21
  • 37
Kikapp
  • 2,053
  • 1
  • 13
  • 7
  • 1
    This answer does not address the problem, which is how do I convert *all* of the factor columns in my data frame to character. `as.character(f)`, is better in both readability and efficiency to `levels(f)[as.numeric(f)]`. If you wanted to be clever, you could use `levels(f)[f]` instead. Note that when converting a factor with numeric values, you do get some benefit from `as.numeric(levels(f))[f]` over, e.g., `as.numeric(as.character(f))`, but this is because you only have to convert the levels to numeric and then subset. `as.character(f)` is just fine as it is. – De Novo Mar 19 '19 at 04:16
21

If you want a new data frame bobc where every factor vector in bobf is converted to a character vector, try this:

bobc <- rapply(bobf, as.character, classes="factor", how="replace")

If you then want to convert it back, you can create a logical vector of which columns are factors, and use that to selectively apply factor

f <- sapply(bobf, class) == "factor"
bobc[,f] <- lapply(bobc[,f], factor)
scentoni
  • 689
  • 7
  • 5
  • 2
    +1 for doing only what was necessary (i.e. not converting the entire data.frame to character). This solution is robust to a data.frame that contains mixed types. – Joshua Ulrich Aug 01 '13 at 21:42
  • 3
    This example should be in the `Examples' section for rapply, like at: http://stat.ethz.ch/R-manual/R-devel/library/base/html/rapply.html . Anyone know how to request that that be so? – mpettis Aug 02 '13 at 03:13
  • If you want to end up with a data frame, simple wrap the rapply in a data.frame call (using the stringsAsFactors set to FALSE argument) – Taylored Web Sites Apr 04 '16 at 19:44
14

I typically make this function apart of all my projects. Quick and easy.

unfactorize <- function(df){
  for(i in which(sapply(df, class) == "factor")) df[[i]] = as.character(df[[i]])
  return(df)
}
Omar Wagih
  • 8,112
  • 6
  • 57
  • 75
10

Another way is to convert it using apply

bob2 <- apply(bob,2,as.character)

And a better one (the previous is of class 'matrix')

bob2 <- as.data.frame(as.matrix(bob),stringsAsFactors=F)
George Dontas
  • 28,739
  • 18
  • 104
  • 145
8

Update: Here's an example of something that doesn't work. I thought it would, but I think that the stringsAsFactors option only works on character strings - it leaves the factors alone.

Try this:

bob2 <- data.frame(bob, stringsAsFactors = FALSE)

Generally speaking, whenever you're having problems with factors that should be characters, there's a stringsAsFactors setting somewhere to help you (including a global setting).

Matt Parker
  • 25,923
  • 6
  • 52
  • 72
  • 1
    This does work, if he sets it when creating `bob` to begin with (but not after the fact). – Shane May 17 '10 at 17:18
  • Right. Just wanted to be clear that this doesn't solve the problem, per se - but thanks for noting that it does prevent it. – Matt Parker May 17 '10 at 17:34
7

Or you can try transform:

newbob <- transform(bob, phenotype = as.character(phenotype))

Just be sure to put every factor you'd like to convert to character.

Or you can do something like this and kill all the pests with one blow:

newbob_char <- as.data.frame(lapply(bob[sapply(bob, is.factor)], as.character), stringsAsFactors = FALSE)
newbob_rest <- bob[!(sapply(bob, is.factor))]
newbob <- cbind(newbob_char, newbob_rest)

It's not good idea to shove the data in code like this, I could do the sapply part separately (actually, it's much easier to do it like that), but you get the point... I haven't checked the code, 'cause I'm not at home, so I hope it works! =)

This approach, however, has a downside... you must reorganize columns afterwards, while with transform you can do whatever you like, but at cost of "pedestrian-style-code-writting"...

So there... =)

aL3xa
  • 34,189
  • 18
  • 78
  • 111
6

At the beginning of your data frame include stringsAsFactors = FALSE to ignore all misunderstandings.

5

If you would use data.table package for the operations on data.frame then the problem is not present.

library(data.table)
dt = data.table(col1 = c("a","b","c"), col2 = 1:3)
sapply(dt, class)
#       col1        col2 
#"character"   "integer" 

If you have a factor columns in you dataset already and you want to convert them to character you can do the following.

library(data.table)
dt = data.table(col1 = factor(c("a","b","c")), col2 = 1:3)
sapply(dt, class)
#     col1      col2 
# "factor" "integer" 
upd.cols = sapply(dt, is.factor)
dt[, names(dt)[upd.cols] := lapply(.SD, as.character), .SDcols = upd.cols]
sapply(dt, class)
#       col1        col2 
#"character"   "integer" 
jangorecki
  • 15,274
  • 3
  • 67
  • 146
2

This works for me - I finally figured a one liner

df <- as.data.frame(lapply(df,function (y) if(class(y)=="factor" ) as.character(y) else y),stringsAsFactors=F)
user1617979
  • 2,250
  • 3
  • 23
  • 29
2

New function "across" was introduced in dplyr version 1.0.0. The new function will supersede scoped variables (_if, _at, _all). Here's the official documentation

library(dplyr)
bob <- bob %>% 
       mutate(across(where(is.factor), as.character))
radhikesh93
  • 790
  • 7
  • 23
1

This function does the trick

df <- stacomirtools::killfactor(df)
Cedric
  • 2,302
  • 16
  • 30
1

You should use convert in hablar which gives readable syntax compatible with tidyverse pipes:

library(dplyr)
library(hablar)

df <- tibble(a = factor(c(1, 2, 3, 4)),
             b = factor(c(5, 6, 7, 8)))

df %>% convert(chr(a:b))

which gives you:

  a     b    
  <chr> <chr>
1 1     5    
2 2     6    
3 3     7    
4 4     8   
davsjob
  • 1,732
  • 12
  • 10
1

Maybe a newer option?

library("tidyverse")

bob <- bob %>% group_by_if(is.factor, as.character)
rachelette
  • 39
  • 5
1

With the dplyr-package loaded use

bob=bob%>%mutate_at("phenotype", as.character)

if you only want to change the phenotype-column specifically.

nexonvantec
  • 532
  • 4
  • 16
0

This works transforming all to character and then the numeric to numeric:

makenumcols<-function(df){
  df<-as.data.frame(df)
  df[] <- lapply(df, as.character)
  cond <- apply(df, 2, function(x) {
    x <- x[!is.na(x)]
    all(suppressWarnings(!is.na(as.numeric(x))))
  })
  numeric_cols <- names(df)[cond]
  df[,numeric_cols] <- sapply(df[,numeric_cols], as.numeric)
  return(df)
}

Adapted from: Get column types of excel sheet automatically

Ferroao
  • 2,383
  • 23
  • 44