0

I have three dataframes df1, df2, df3

df1
     Freq
Yes    10
No      2

df2 
     Freq
Yes     5
No      7

df2 
     Freq
Yes     5
No      7

I combined it to df

df1 <- data.frame(Freq= c(10,2))
df2 <- data.frame(Freq= c(5,7))
df3 <- data.frame(Freq= c(9,3))
rownames(df1) <- c("Yes","No")
rownames(df2) <- c("Yes","No")
rownames(df3) <- c("Yes","No")

df <- data.frame(rbind(df1,df2,df3))
df
       Freq
Yes      10
No        2
Yes1      5
No1       7
Yes2      9
No2       3

Now if you see df, whenever it finds duplicate rowname, it pastes an index next to it.

How can I remove the indexing of duplicate names and keep it as it was in first place? Then I change the rownames(df) to be the first column in df with Answer as Column name.

 library(tibble)
 df <- rownames_to_column(df,"Answer")

   Answer   Freq
      Yes     10
       No      2
     Yes1      5
      No1      7
     Yes2      9
      No2      3

Desired Result

   Answer   Freq
      Yes     10
       No      2
      Yes      5
       No      7
      Yes      9
       No      3
divibisan
  • 10,372
  • 11
  • 36
  • 56
Sowmya S. Manian
  • 3,621
  • 3
  • 17
  • 28
  • How were you able to have the data.frame `df` first of all!? It throws an error to have unique rownames – joel.wilson Jan 13 '17 at 05:43
  • Yes, sorry I saw that I have edited the question – Sowmya S. Manian Jan 13 '17 at 05:46
  • 1
    data.frames can't have repeated rownames. You can fix the second version easily with regex, though: `library(tidyverse); df %>% rownames_to_column('Answer') %>% mutate(Answer = sub('\\d', '', Answer))` – alistaire Jan 13 '17 at 05:47
  • @alistaire Thanks it worked!!!! – Sowmya S. Manian Jan 13 '17 at 05:53
  • @alistaire Please put it as an answer. so that I can accept it. – Sowmya S. Manian Jan 13 '17 at 05:57
  • @alistaire Hi The answr works, but now I am facing another problem, the other tables have digits in their names too, that too get removed. Do you have any other alternative..?? So that It just changes for duplicate ones and not other levels which have digits in them. – Sowmya S. Manian Jan 13 '17 at 06:01
  • 1
    The better way to do it is to call `rowname_to_column` before you `rbind`. If you put your data.frames in a list (a good idea regardless), purrr makes this very simple: `library(tidyverse); list(df1, df2, df3) %>% map_df(rownames_to_column, 'Answer')` – alistaire Jan 13 '17 at 06:02
  • @RonakShah Yes you are right. I just want to remove digits from the duplicate ones and not the other rownames having digits somewhere else in them and which appears only once. – Sowmya S. Manian Jan 13 '17 at 06:02
  • @alistaire Hey its working!!!!!!!!!! – Sowmya S. Manian Jan 13 '17 at 06:10
  • I understood whats happening. – Sowmya S. Manian Jan 13 '17 at 06:10
  • @RonakShah No digits appear, when I combine df1, df2, df3. Run the code and see. It appears when it finds a duplicate rowname. – Sowmya S. Manian Jan 13 '17 at 06:11
  • For thoroughness, the base R equivalent of the purrr approach above: `do.call(rbind, lapply(list(df1, df2, df3), function(x){x$Answer – alistaire Jan 13 '17 at 06:11
  • Yes, I have used `purrr` package before. Let me check what functions are available in this one. – Sowmya S. Manian Jan 13 '17 at 06:14
  • Do you want to include a column that shows which data frame each row originally came from? I came up with this: `map_df(paste0("df",1:3), . %>% cbind(get(.), source=.) %>% rownames_to_column("Answer"))`. But that seems clunky and I'm wondering if @alistaire or anyone else has a better approach. – eipi10 Jan 13 '17 at 06:44
  • 1
    @eipi10 You can use the `.id` parameter which will get passed to `bind_cols`: `list(df1, df2, df3) %>% map_df(rownames_to_column, 'Answer', .id = 'df')` – alistaire Jan 13 '17 at 06:46
  • Thanks @alistaire, I hadn't really looked at the `.id` parameter before. – eipi10 Jan 13 '17 at 06:49
  • somewhat simpler would be just to change `rownames(dfX)` to `dfX$Answer` for all 3 `df`s , then your original code would work also. (where X = 1,2,3) – davidski Jan 13 '17 at 07:29
  • Anybody going to answer it? – Sotos Jan 13 '17 at 07:39

2 Answers2

1

While you can remove the indexing via regex (see the comments above), you can avoid the issue by calling rownames_to_column before you rbind. purrr::map_df makes doing so very easy, as it is a version of lapply that simplifies its result into a data.frame. Thus,

library(tidyverse)    # contains both tibble and purrr

list(df1, df2, df3) %>% map_df(rownames_to_column, 'Answer')

##   Answer Freq
## 1    Yes   10
## 2     No    2
## 3    Yes    5
## 4     No    7
## 5    Yes    9
## 6     No    3

You could also use purrr's abbreviated anonymous function syntax: ~rownames_to_column(.x, 'Answer') or more traditional syntax: map_df(list(df1, df2, df3), function(x){rownames_to_column(x, 'Answer')}), all of which do the same thing.

If you'd like to accomplish the same thing with just base R,

do.call(rbind, lapply(list(df1, df2, df3), function(x){
    x$Answer <- rownames(x); 
    rownames(x) <- NULL; 
    x
}))

returns the same thing.

If you'd like to add an identifier column for which row came from which data.frame, pass map_df's .id parameter (which works like that in dplyr::bind_rows) a name for such a column, just like rownames_to_column:

list(one = df1, two = df2, three = df3) %>% map_df(rownames_to_column, "Answer", .id = "df")

##      df Answer Freq
## 1   one    Yes   10
## 2   one     No    2
## 3   two    Yes    5
## 4   two     No    7
## 5 three    Yes    9
## 6 three     No    3

If you don't supply names in the list, .id grabs the defaults, which are strings of numbers. Convert if you like.

alistaire
  • 40,464
  • 4
  • 71
  • 108
0

We can do this with data.table

library(data.table)
rbindlist(lapply(mget(paste0("df", 1:3)),
        function(x) transform(x, Answer = row.names(x))[2:1]))
#   Answer Freq
#1:    Yes   10
#2:     No    2
#3:    Yes    5
#4:     No    7
#5:    Yes    9
#6:     No    3
akrun
  • 789,025
  • 32
  • 460
  • 575