0

I have a dataset like df with 90k lines.

id <- c("1", "1", "1", "2", "2", "3")
type <- c("A" , "B" , "C" , "A", "D", "B") 
df <- data_frame(id, type)

and I want to reshape (from long to wide) my data + calculate the covariance ALREADY AS A SPARSE MATRIX (otherwise R session aborts).

The outcome might be the following cov_id_type:

df$flag <- 1
id_type <- df %>% pivot_wider(names_from = "type",
                              values_from = "flag",  
                              values_fill = 0)
rownames(id_type) <- id_type$id #rownames from variable id
id_type$id <- NULL  #remove identifier variable

cov_id_type <- cov(id_type)

I have tried to use table() following the post (reshape two column data to sparse matrix in r long to wide) but it does not work.

I need to reshape my data and calculate the covariance ALREADY AS A SPARSE MATRIX (otherwise R session aborts).

Any clue?

vog
  • 546
  • 3
  • 10
  • 1
    This may help you create the [sparse matrix](https://stackoverflow.com/questions/67183296/how-to-pivot-a-large-data-frame-or-matrix-without-hitting-memory-limits/67183913#67183913). I am not sure if `Matrix` has a cov() method implemented, may need to find some package that does it. From a quick look, you can [implement it yourself](https://stackoverflow.com/questions/5888287/running-cor-or-any-variant-over-a-sparse-matrix-in-r) if needed. – Adam May 03 '22 at 13:13

0 Answers0