0

Here's a data frame I'm working with:

c1 = c('a', 'b', 'c', 'd')
c2 = c('d', 'a', 'd', 'c')
c3 = c('a', 'c', 'd', 'b')
c4 = c('a', 'c', 'b', 'd')
df = data.frame(c1, c2, c3, c4)

c1    c2    c3    c4
a     d     a     a
b     a     c     c
c     d     d     b
d     c     b     d

I would like to convert using this scale: a=1, b=2, c=3, d=4. So that I get something like this:

c1 c2 c3 c4
  1  4  1  1
  2  1  3  3
  3  4  4  2
  4  3  2  4

This is what I have come up with:

for(i in colnames(df)){
    df$i = gsub("a", 1, df$i)
    df$i = gsub("b", 2, df$i)
    df$i = gsub("c", 3, df$i)
    df$i = gsub("d", 4, df$i)
 }

But it doesn't work. Should I use gsub here, or is there a simpler way to do this?

jason adams
  • 535
  • 2
  • 14
  • 29
  • 1
    similar to the answer below, if your key wasn't sequential, you could make your own `key – rawr Dec 07 '14 at 04:13

1 Answers1

3

We can do this in a couple of ways. One way is to convert the data.frame to matrix and then match those with unique elements in the dataset. i.e. in this case letters[1:4]. But the result will be a vector. We can convert it to the same dimensions of original dataset by specifying the dim as the dim(df) ie. dim<-(..., dim(df). Also please check here to find out more details about the assignment.

df2 <- df
df2[] <- `dim<-`(match(as.matrix(df), letters[1:4]), dim(df))
df2
#  c1 c2 c3 c4
#1  1  4  1  1
#2  2  1  3  3
#3  3  4  4  2
#4  4  3  2  4

The above code can be split into separate lines:

v1 <- match(as.matrix(df), letters[1:4])
df2[] <- `dim<-`(v1, dim(df))

or

df2[] <- matrix(v1, ncol=ncol(df), row=nrow(df))

Another option is to convert the dataset columns to factor with levels specified as unique values of dataset and then convert it to numeric by as.numeric. This can be done in a loop using lapply

df2[] <-lapply(df, function(x) as.numeric(factor(x, levels=letters[1:4])))
Community
  • 1
  • 1
akrun
  • 789,025
  • 32
  • 460
  • 575