0

I have a data frame similar to this:

data <- data.frame(
  Location = rep(letters[1:10], each = 20),
  ID = rep(1:40, each = 5)
)

I want to return a table that contains each unique Location in one column and a count of the number of unique IDs in each Location in another column, so it will look like this:

Location   Count
   a         4
   b         4
   ...      ...

Note: in my actual data set there are different numbers of IDs in each Location, and there are other variables in other columns.

What is the best way to do this?

Jaap
  • 77,147
  • 31
  • 174
  • 185
Ryan
  • 984
  • 4
  • 10

2 Answers2

1

The table class of objects has a as.data.frame method:

as.data.frame(table(data$Location))
   Var1 Freq
1     a   20
2     b   20
3     c   20
4     d   20
5     e   20
6     f   20
7     g   20
8     h   20
9     i   20
10    j   20
IRTFM
  • 251,731
  • 20
  • 347
  • 472
0

We can use n_distinct on the 'ID' column after grouping by 'Location'. In the example, it is all 4

library(dplyr)
data %>% 
    group_by(Location) %>%
    summarise(Count = n_distinct(ID))

If we need to add a new column, use mutate instead of summarise


With data.table, this can be done with uniqueN

library(data.table)
setDT(data)[, .(Count = uniqueN(ID)), Location]
akrun
  • 789,025
  • 32
  • 460
  • 575