0

I would like to transform certain columns with a specific string code to a factor in the same data.frame. However, I am stymied by the initial task of passing the data.frame column reference to my function. Working from examples here and its linked pages, I believe the following should work:

#feed string to function

set.seed(42)
df <- data.frame(
chr1 = sample(letters[1:4], 10, T),
chr2 = sample(letters[4:7], 10, T), 
stringsAsFactors = F
)

tofactor <- function(dat,column) {
  dat[,column] <- as.factor(dat[,column])
}

tofactor(df, "chr1")
typeof(df$chr1)

However, the result of this operation is persistence of string encoding for df$chr1. I have also tried a reference using a double square brackets approach without success.

Thanks for your assistance.

rawr
  • 19,873
  • 4
  • 42
  • 74
Todd D
  • 233
  • 1
  • 13
  • add `dat` as the final line in your function – rawr Jul 11 '17 at 21:34
  • Previous poster is incorrect; your function is fine. Your problem is that you don't assign the result of the `tofactor` function to anything. Use `df$chr1 – cmaher Jul 11 '17 at 21:37
  • 2
    previous poster's advice is silly; my way is better – rawr Jul 11 '17 at 21:41
  • Why doesn't the single line of the function accomplish the intended replacement? – Todd D Jul 11 '17 at 21:47
  • it _does_ do the replacement, but your function is returning the value that is returned by `\`[ – rawr Jul 11 '17 at 22:01
  • @Todd You should spend some time to study scoping in R. Changes inside a function (usually, with few special exceptions that you normally should avoid) don't affect objects outside a function. You should return the changed data.frame and assign it to the original data.frame. – Roland Jul 11 '17 at 22:05
  • My plan was to place the column names in a vector and then `apply()` for each value in the vector. Based on these comments, it would seem this strategy may not work. – Todd D Jul 11 '17 at 22:27

3 Answers3

0

The function is working fine, all you need to do is assign the output to the original (or a new df).

df <- tofactor(df, "chr1")

If you run str(tofactor(df,"chr1")) you get the return:

Factor w/ 4 levels "a","b","c","d": 4 4 2 4 3 3 3 1 3 3

Mako212
  • 6,318
  • 1
  • 16
  • 32
0

Another way is to use mutate_at and specify the variables inside of var:

library(dplyr)

df <- data.frame(
  chr1 = sample(letters[1:4], 10, T),
  chr2 = sample(letters[4:7], 10, T), 
  stringsAsFactors = F
)

df2 <- df %>% 
 mutate_at(vars(chr1), as.factor)

class(df2$chr1) #[1] "factor"
roarkz
  • 759
  • 8
  • 20
0

After understanding scope better and direction to assign() from a colleague, I've arrived at:

set.seed(42)
df <- data.frame(
  chr1 = sample(letters[1:4], 10, T),
  chr2 = sample(letters[4:7], 10, T), 
  stringsAsFactors = F
)

tofactor <- function(dat,column) {
  dat[,column] <- as.factor(dat[,column])
  assign("df",dat, envir = .GlobalEnv)
}

tofactor(df, "chr1")
typeof(df$chr1)

This solution handles the replacement in the function, which allows for repeated use without having to assign the output in an additional step.

Todd D
  • 233
  • 1
  • 13