12

I would like to convert my dataframe into a matrix that expands a single factor column into multiple ones and assigns a 1/0 depending on the factor. For example

C1 C2 C3
A  3  5
B  3  4
A  1  1

Should turn into something like

C1_A C1_B C2 C3
1      0  3  5
0      1  3  4
1      0  1  1

How can I do this in R? I tried data.matrix, as.matrix which did not return what I wanted. They assign an "integer" value to a single factor column, there is no expansion.

Sven Hohenstein
  • 78,180
  • 16
  • 134
  • 160
BBSysDyn
  • 4,129
  • 8
  • 43
  • 60

3 Answers3

16

Assuming dat is your data frame:

cbind(dat, model.matrix( ~ 0 + C1, dat))

  C1 C2 C3 C1A C1B
1  A  3  5   1   0
2  B  3  4   0   1
3  A  1  1   1   0

This solution works with any number of factor levels and without manually specifying column names.

If you want to exclude the column C1, you could use this command:

cbind(dat[-1], model.matrix( ~ 0 + C1, dat))
Sven Hohenstein
  • 78,180
  • 16
  • 134
  • 160
3

Let's call your data.frame df:

library(reshape2)
dcast(df,C2*C3~C1,fill=0,length)

  C2 C3 A B
1  1  1 1 0
2  3  4 0 1
3  3  5 1 0
Roland
  • 122,144
  • 10
  • 182
  • 276
  • 1
    Thanks for both the answers.. isnt there a way to do this conversion without specifying any column names, such as C1? Simply .. convert(df) and it will handle factors. lm() as well as other regression methods do this internally right? – BBSysDyn Dec 16 '12 at 13:39
3
dat <- read.table(text =' C1 C2 C3
A  3  5
B  3  4
A  1  1',header=T)

Using transform

transform(dat,C1_A =ifelse(C1=='A',1,0),C1_B =ifelse(C1=='B',1,0))[,-1]
  C2 C3 C1_A C1_B
1  3  5    1    0
2  3  4    0    1
3  1  1    1    0

Or to get more flexbility , with within

within(dat,{ 
             C1_A =ifelse(C1=='A',1,0)
             C1_B =ifelse(C1=='B',1,0)})

  C1 C2 C3  C1_B C1_A
1  A  3  5    0    1
2  B  3  4    1    0
3  A  1  1    0    1
agstudy
  • 116,828
  • 17
  • 186
  • 250