-1

I am starting to get a hold on R-language, but I can't seem to be able to create a dataframe which has columns X and rows Y. This is the python code I have written. Need to replicate this in R.

df = pd.read_csv('file.csv')
Cols = pd.unique(df['desiredCol1'])
Rows =pd.unique(df['desiredCol2'])
df2 = pd.DataFrame(index=Rows,columns=pd.unique(Cols))

This is the R code that I have written, but isn't returning desired results.

d= matrix(data = NA, nrow = length(Rows), ncol = length(Cols), byrow = FALSE,
      dimnames =   list(Rows, Cols))

EDIT: as.data.frame function converts the matrix to a standard R dataframe.

d = as.data.frame(d)

someone
  • 147
  • 8
  • But how you read a file and declare Cols and Rows in R? pd.unique works in python not in R. What errors you received? – raquela Nov 20 '17 at 07:45
  • in both python and R, the matrix and dataframe are different classes. Also, it's a bit unusual (again, in both python or R) to instantiate an empty dataframe. – David Marx Nov 20 '17 at 07:50

1 Answers1

0

Wlecome to R :-). Maybe you could tell us a bit more, how you want to use the data.frame. The reason to create an empty data.frame is usually if you want to run a for loop and write results into specific positions in the data.frame. Initializing such empty data.frame can speed up the for loop to avoid growing the object within the loop, however, as @David Marx pointed out, this is usually not the way to go. Try to avoid this if possible and aim at "vectorized code".

As an answer to your question, I think you simply forgot to repeat your data points. Furthemore, as long as all your data points have the same type, you can use a matrix (as you did), only for mixed types you need a data.frame. Check the documentation via ?matrixand ?data.frame. You will also find examples there.

As a side note, have a look at the answer`s concerning how-to-make-a-great-r-reproducible-example, this will make it easier for others to help you and provide precise answers.

Rows <- c("A","B","C")
Cols <- c("X","Y","Z")

d = matrix(data = rep(NA, times = length(Rows)*length(Cols))
           ,nrow = length(Rows)
           ,ncol = length(Cols)
           ,byrow = FALSE
           ,dimnames =   list(Rows, Cols))

d
#   X  Y  Z
# A NA NA NA
# B NA NA NA
# C NA NA NA
Manuel Bickel
  • 2,116
  • 2
  • 10
  • 22