How to reshape data from long to wide format

Question

I'm having trouble rearranging the following data frame:

set.seed(45)
dat1 <- data.frame(
    name = rep(c("firstName", "secondName"), each=4),
    numbers = rep(1:4, 2),
    value = rnorm(8)
    )

dat1
       name  numbers      value
1  firstName       1  0.3407997
2  firstName       2 -0.7033403
3  firstName       3 -0.3795377
4  firstName       4 -0.7460474
5 secondName       1 -0.8981073
6 secondName       2 -0.3347941
7 secondName       3 -0.5013782
8 secondName       4 -0.1745357

I want to reshape it so that each unique "name" variable is a rowname, with the "values" as observations along that row and the "numbers" as colnames. Sort of like this:

     name          1          2          3         4
1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

I've looked at melt and cast and a few other things, but none seem to do the job.

possible duplicate of [Reshape three column data frame to matrix](http://stackoverflow.com/questions/9617348/reshape-three-column-data-frame-to-matrix) — Frank, Oct 08 '13 at 20:53
@Frank: this is a much better title. [tag:long-form] and [tag:wide-form] are the standard terms used. The other answer cannot be found by searching on those terms. — smci, Apr 11 '14 at 05:21
A much more canonical answer can be found at the question linked about, now with the name [Reshape three column data frame to matrix ("long" to "wide" format)](https://stackoverflow.com/a/9617424/210673). In my opinion, it would have been better for this one to have been closed as a duplicate of that. — Aaron left Stack Overflow, Oct 14 '21 at 17:36
The fact that the other question has one answer with a lot of options doesn't make it necessarily better than this; which has also a lot of options but in several answers. Furthermore, the definition of a duplicate is *"This question already has answer here"* (with a link to another earlier asked question). — Jaap, Oct 15 '21 at 12:08

score 329 · Accepted Answer · answered May 04 '11 at 23:20

329

Using reshape function:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")

answered May 04 '11 at 23:20

Chase

65,190
17
140
160

16

+1 and you don't need to rely on external packages, since `reshape` comes with `stats`. Not to mention that it's faster! =) – aL3xa May 05 '11 at 00:07
@indra_patil - I would likely use the reshape2 package as indicated in one of the other answers. You could create a new question that's specific to your use case and post it if you can't figure it out. – Chase Feb 10 '16 at 03:16
it seems it will create duplicated "name" columns, that may be wanted or not wanted by the author of this thread – cloudscomputes Oct 20 '17 at 04:39
8

`reshape` is an outstanding example for a horrible function API. It is very close to useless. – NoBackingDown Oct 26 '17 at 15:18
25

The `reshape` comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide `data =` your data.frame, `idvar` = the variable that identifies your groups, `v.names` = the variables that will become multiple columns in wide format, `timevar` = the variable containing the values that will be appended to `v.names` in wide format, `direction = wide`, and `sep = "_"`. Clear enough? ;) – Brian D Nov 17 '17 at 17:11
5

I would say base R still wins vote-wise by a factor of about 2 to 1 – vonjd Nov 22 '18 at 15:14
1

Sometimes there are two `idvars=`, in this case we can do the following: `reshape(dat1, idvar=c("name1", "name2"), timevar="numbers", direction="wide")` – jay.sf Jul 12 '21 at 16:54
How can one use *reshape* if there are 2 or more entries for each `idvar`? For instance, from the example data, I would have 2 values for `name=firstname` and `numbers=1`. – Martin Sep 27 '21 at 12:44

score 157 · Answer 2 · edited Jun 20 '20 at 09:12

The new (in 2014) tidyr package also does this simply, with gather()/spread() being the terms for melt/cast.

Edit: Now, in 2019, tidyr v 1.0 has launched and set spread and gather on a deprecation path, preferring instead pivot_wider and pivot_longer, which you can find described in this answer. Read on if you want a brief glimpse into the brief life of spread/gather.

library(tidyr)
spread(dat1, key = numbers, value = value)

From github,

tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.

Just as reshape2 did less than reshape, tidyr does less than reshape2. It's designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.

Just wanted to add a link to the [R Cookbook](http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/) page that discusses the use of these functions from `tidyr` and `reshape2`. It provides good examples and explanations. — Jake, Apr 12 '17 at 13:01

score 84 · Answer 3 · edited May 26 '15 at 14:52

84

You can do this with the reshape() function, or with the melt() / cast() functions in the reshape package. For the second option, example code is

library(reshape)
cast(dat1, name ~ numbers)

Or using reshape2

library(reshape2)
dcast(dat1, name ~ numbers)

edited May 26 '15 at 14:52

David Arenburg

89,637
17
130
188

answered May 04 '11 at 22:42

Ista

9,711
2
34
34

3

It might be worth noting that just using `cast` or `dcast` will not work nicely if you don't have a clear "value" column. Try `dat – thelatemail Jun 21 '17 at 22:37
Note that reshape2 is deprecated and you should be migrating your code away from using it. – dpel Jan 21 '21 at 09:54
5

@dpel A more optimistic spin is to say that reshape2 is finally done and you can now use it without fear that Hadley will change it again and break your code! – Ista Jan 22 '21 at 22:48

SymbolixAU · Answer 4 · 2016-03-27T22:51:39.443

59

Another option if performance is a concern is to use data.table's extension of reshape2's melt & dcast functions

(Reference: Efficient reshaping using data.tables)

library(data.table)

setDT(dat1)
dcast(dat1, name ~ numbers, value.var = "value")

#          name          1          2         3         4
# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078
# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814

And, as of data.table v1.9.6 we can cast on multiple columns

## add an extra column
dat1[, value2 := value * 2]

## cast multiple value columns
dcast(dat1, name ~ numbers, value.var = c("value", "value2"))

#          name    value_1    value_2   value_3   value_4   value2_1   value2_2 value2_3  value2_4
# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078  0.3672866 -1.6712572 3.190562 0.6590155
# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814 -1.6409368  0.9748581 1.476649 1.1515627

edited Mar 27 '16 at 22:51

answered Mar 27 '16 at 22:35

SymbolixAU

23,954
4
56
128

8

`data.table` approach is the best ! very efficient ... you will see the difference when `name` is a combination of 30-40 columns !! – joel.wilson Aug 31 '17 at 12:06
What if I wanted to take the max? – Sweepy Dodo Mar 19 '19 at 22:54
@T.Fung I don't understand what you're asking. Might be best to open a new question? – SymbolixAU Mar 19 '19 at 23:54
@SymbolixAU in op's question 'name' and 'numbers' are unique combinations. What if they were not and I wanted to fetch the max value for each combination after pivoting? Not a problem if too fiddly a question. Just food for thoughts. Thank you. – Sweepy Dodo Mar 20 '19 at 10:11
Great answer. Thank you. For multiple columns, I got "Error in .subset2(x, i, exact = exact)", and could fix this by forcing the use of data.table dcast: see https://stackoverflow.com/a/44271092/190791 – Timothée HENRY Jul 03 '19 at 07:07

score 38 · Answer 5 · answered Jul 12 '19 at 20:14

With the devel version of tidyr ‘0.8.3.9000’, there is pivot_wider and pivot_longer which is generalized to do the reshaping (long -> wide, wide -> long, respectively) from 1 to multiple columns. Using the OP's data

-single column long -> wide

library(dplyr)
library(tidyr)
dat1 %>% 
    pivot_wider(names_from = numbers, values_from = value)
# A tibble: 2 x 5
#  name          `1`    `2`    `3`    `4`
#  <fct>       <dbl>  <dbl>  <dbl>  <dbl>
#1 firstName   0.341 -0.703 -0.380 -0.746
#2 secondName -0.898 -0.335 -0.501 -0.175

-> created another column for showing the functionality

dat1 %>% 
    mutate(value2 = value * 2) %>% 
    pivot_wider(names_from = numbers, values_from = c("value", "value2"))
# A tibble: 2 x 9
#  name       value_1 value_2 value_3 value_4 value2_1 value2_2 value2_3 value2_4
#  <fct>        <dbl>   <dbl>   <dbl>   <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#1 firstName    0.341  -0.703  -0.380  -0.746    0.682   -1.41    -0.759   -1.49 
#2 secondName  -0.898  -0.335  -0.501  -0.175   -1.80    -0.670   -1.00    -0.349

score 29 · Answer 6 · edited Sep 02 '16 at 07:37

29

Using your example dataframe, we could:

xtabs(value ~ name + numbers, data = dat1)

edited Sep 02 '16 at 07:37

zx8754

46,390
10
104
180

answered May 04 '11 at 22:58

3

this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages – cloudscomputes Oct 20 '17 at 04:44

score 24 · Answer 7 · answered Jul 14 '15 at 17:44

Other two options:

Base package:

df <- unstack(dat1, form = value ~ numbers)
rownames(df) <- unique(dat1$name)
df

sqldf package:

library(sqldf)
sqldf('SELECT name,
      MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1, 
      MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,
      MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,
      MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4
      FROM dat1
      GROUP BY name')

Instead of hardcoding numbers, the query can be set up like this: ```ValCol — M--, Apr 29 '19 at 17:58

score 16 · Answer 8 · edited Dec 25 '17 at 04:05

16

Using base R aggregate function:

aggregate(value ~ name, dat1, I)

# name           value.1  value.2  value.3  value.4
#1 firstName      0.4145  -0.4747   0.0659   -0.5024
#2 secondName    -0.8259   0.1669  -0.8962    0.1681

edited Dec 25 '17 at 04:05

onyambu

49,350
3
19
45

answered Sep 02 '16 at 07:52

Ronak Shah

355,584
18
123
178

score 14 · Answer 9 · edited Aug 29 '18 at 03:00

The base reshape function works perfectly fine:

df <- data.frame(
  year   = c(rep(2000, 12), rep(2001, 12)),
  month  = rep(1:12, 2),
  values = rnorm(24)
)
df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")
df_wide

Where

idvar is the column of classes that separates rows
timevar is the column of classes to cast wide
v.names is the column containing numeric values
direction specifies wide or long format
the optional sep argument is the separator used in between timevar class names and v.names in the output data.frame.

If no idvar exists, create one before using the reshape() function:

df$id   <- c(rep("year1", 12), rep("year2", 12))
df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")
df_wide

Just remember that idvar is required! The timevar and v.names part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.

score 10 · Answer 10 · answered Dec 23 '17 at 23:01

There's very powerful new package from genius data scientists at Win-Vector (folks that made vtreat, seplyr and replyr) called cdata. It implements "coordinated data" principles described in this document and also in this blog post. The idea is that regardless how you organize your data, it should be possible to identify individual data points using a system of "data coordinates". Here's a excerpt from the recent blog post by John Mount:

The whole system is based on two primitives or operators cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These operators have pivot, un-pivot, one-hot encode, transpose, moving multiple rows and columns, and many other transforms as simple special cases.

It is easy to write many different operations in terms of the cdata primitives. These operators can work-in memory or at big data scale (with databases and Apache Spark; for big data use the cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN() variants). The transforms are controlled by a control table that itself is a diagram of (or picture of) the transform.

We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.

library(cdata)
# first build the control table
pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset
                        columnToTakeKeysFrom = 'numbers', # this will become column headers
                        columnToTakeValuesFrom = 'value', # this contains data
                        sep="_")                          # optional for making column names

# perform the move of data to columns
dat_wide <- moveValuesToColumnsD(tallTable =  dat1, # reference to dataset
                    keyColumns = c('name'),         # this(these) column(s) should stay untouched 
                    controlTable = pivotControlTable# control table above
                    ) 
dat_wide

#>         name  numbers_1  numbers_2  numbers_3  numbers_4
#> 1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
#> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

Answer needs updating, since the package seems to be rewritten (and links are dead) — runr, Jan 06 '22 at 11:59

score 3 · Answer 11 · answered Jul 26 '19 at 05:47

much easier way!

devtools::install_github("yikeshu0611/onetree") #install onetree package

library(onetree)
widedata=reshape_toWide(data = dat1,id = "name",j = "numbers",value.var.prefix = "value")
widedata

        name     value1     value2     value3     value4
   firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
  secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

if you want to go back from wide to long, only change Wide to Long, and no changes in objects.

reshape_toLong(data = widedata,id = "name",j = "numbers",value.var.prefix = "value")

        name numbers      value
   firstName       1  0.3407997
  secondName       1 -0.8981073
   firstName       2 -0.7033403
  secondName       2 -0.3347941
   firstName       3 -0.3795377
  secondName       3 -0.5013782
   firstName       4 -0.7460474
  secondName       4 -0.1745357

fmassica · Answer 12 · 2021-11-08T11:29:10.923

Using only dplyr and map.

library(dplyr)
library(purrr)
set.seed(45)
dat1 <- data.frame(
  name = rep(c("firstName", "secondName"), each=4),
  numbers = rep(1:4, 2), value = rnorm(8)
)
longer_to_wider <- function(data, name_from, value_from){
  group <- colnames(data)[!(colnames(data) %in% c(name_from,value_from))]
  data %>% group_by(.data[[group]]) %>%
    summarise( name = list(.data[[name_from]]), 
               value = list(.data[[value_from]])) %>%
    {
      d <- data.frame(
        name = .[[name_from]] %>% unlist() %>% unique()
      )
      e <- map_dfc(.[[group]],function(x){
          y <- data_frame(
            x = data %>% filter(.data[[group]] == x) %>% pull(value_from)
          )
          colnames(y) <- x
          y
      })
      cbind(d,e)
    }
}
longer_to_wider(dat1, "name", "value")
#    name          1          2          3          4
# 1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
# 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

How to reshape data from long to wide format

12 Answers12

Linked

Related