Keep Duplicate row names in different dataframes as it is when combining it to one data frame in R

Question

I have three dataframes df1, df2, df3

df1
     Freq
Yes    10
No      2

df2 
     Freq
Yes     5
No      7

df2 
     Freq
Yes     5
No      7

I combined it to df

df1 <- data.frame(Freq= c(10,2))
df2 <- data.frame(Freq= c(5,7))
df3 <- data.frame(Freq= c(9,3))
rownames(df1) <- c("Yes","No")
rownames(df2) <- c("Yes","No")
rownames(df3) <- c("Yes","No")

df <- data.frame(rbind(df1,df2,df3))
df
       Freq
Yes      10
No        2
Yes1      5
No1       7
Yes2      9
No2       3

Now if you see df, whenever it finds duplicate rowname, it pastes an index next to it.

How can I remove the indexing of duplicate names and keep it as it was in first place? Then I change the rownames(df) to be the first column in df with Answer as Column name.

 library(tibble)
 df <- rownames_to_column(df,"Answer")

   Answer   Freq
      Yes     10
       No      2
     Yes1      5
      No1      7
     Yes2      9
      No2      3

Desired Result

   Answer   Freq
      Yes     10
       No      2
      Yes      5
       No      7
      Yes      9
       No      3

How were you able to have the data.frame `df` first of all!? It throws an error to have unique rownames — joel.wilson, Jan 13 '17 at 05:43
data.frames can't have repeated rownames. You can fix the second version easily with regex, though: `library(tidyverse); df %>% rownames_to_column('Answer') %>% mutate(Answer = sub('\\d', '', Answer))` — alistaire, Jan 13 '17 at 05:47
@alistaire Please put it as an answer. so that I can accept it. — Sowmya S. Manian, Jan 13 '17 at 05:57
@alistaire Hi The answr works, but now I am facing another problem, the other tables have digits in their names too, that too get removed. Do you have any other alternative..?? So that It just changes for duplicate ones and not other levels which have digits in them. — Sowmya S. Manian, Jan 13 '17 at 06:01
The better way to do it is to call `rowname_to_column` before you `rbind`. If you put your data.frames in a list (a good idea regardless), purrr makes this very simple: `library(tidyverse); list(df1, df2, df3) %>% map_df(rownames_to_column, 'Answer')` — alistaire, Jan 13 '17 at 06:02
@RonakShah Yes you are right. I just want to remove digits from the duplicate ones and not the other rownames having digits somewhere else in them and which appears only once. — Sowmya S. Manian, Jan 13 '17 at 06:02
@RonakShah No digits appear, when I combine df1, df2, df3. Run the code and see. It appears when it finds a duplicate rowname. — Sowmya S. Manian, Jan 13 '17 at 06:11
For thoroughness, the base R equivalent of the purrr approach above: `do.call(rbind, lapply(list(df1, df2, df3), function(x){x$Answer — alistaire, Jan 13 '17 at 06:11
Yes, I have used `purrr` package before. Let me check what functions are available in this one. — Sowmya S. Manian, Jan 13 '17 at 06:14
Do you want to include a column that shows which data frame each row originally came from? I came up with this: `map_df(paste0("df",1:3), . %>% cbind(get(.), source=.) %>% rownames_to_column("Answer"))`. But that seems clunky and I'm wondering if @alistaire or anyone else has a better approach. — eipi10, Jan 13 '17 at 06:44
@eipi10 You can use the `.id` parameter which will get passed to `bind_cols`: `list(df1, df2, df3) %>% map_df(rownames_to_column, 'Answer', .id = 'df')` — alistaire, Jan 13 '17 at 06:46
Thanks @alistaire, I hadn't really looked at the `.id` parameter before. — eipi10, Jan 13 '17 at 06:49
somewhat simpler would be just to change `rownames(dfX)` to `dfX$Answer` for all 3 `df`s , then your original code would work also. (where X = 1,2,3) — davidski, Jan 13 '17 at 07:29

score 1 · Answer 1 · answered Jan 13 '17 at 16:15

While you can remove the indexing via regex (see the comments above), you can avoid the issue by calling rownames_to_column before you rbind. purrr::map_df makes doing so very easy, as it is a version of lapply that simplifies its result into a data.frame. Thus,

library(tidyverse)    # contains both tibble and purrr

list(df1, df2, df3) %>% map_df(rownames_to_column, 'Answer')

##   Answer Freq
## 1    Yes   10
## 2     No    2
## 3    Yes    5
## 4     No    7
## 5    Yes    9
## 6     No    3

You could also use purrr's abbreviated anonymous function syntax: ~rownames_to_column(.x, 'Answer') or more traditional syntax: map_df(list(df1, df2, df3), function(x){rownames_to_column(x, 'Answer')}), all of which do the same thing.

If you'd like to accomplish the same thing with just base R,

do.call(rbind, lapply(list(df1, df2, df3), function(x){
    x$Answer <- rownames(x); 
    rownames(x) <- NULL; 
    x
}))

returns the same thing.

If you'd like to add an identifier column for which row came from which data.frame, pass map_df's .id parameter (which works like that in dplyr::bind_rows) a name for such a column, just like rownames_to_column:

list(one = df1, two = df2, three = df3) %>% map_df(rownames_to_column, "Answer", .id = "df")

##      df Answer Freq
## 1   one    Yes   10
## 2   one     No    2
## 3   two    Yes    5
## 4   two     No    7
## 5 three    Yes    9
## 6 three     No    3

If you don't supply names in the list, .id grabs the defaults, which are strings of numbers. Convert if you like.

score 0 · Answer 2 · answered Jan 16 '17 at 04:29

We can do this with data.table

library(data.table)
rbindlist(lapply(mget(paste0("df", 1:3)),
        function(x) transform(x, Answer = row.names(x))[2:1]))
#   Answer Freq
#1:    Yes   10
#2:     No    2
#3:    Yes    5
#4:     No    7
#5:    Yes    9
#6:     No    3

Keep Duplicate row names in different dataframes as it is when combining it to one data frame in R

2 Answers2