-1

I have the following dataset:

Names   Category
Jack    1
Jack    1
Jack    1
Tom     0
Tom     0
Sara    0
Sara    0

what I am looking for is the following:

Category Number
0        2
1        1

that is, the number of unique values in column Names per each category.

I can get the number of unique values in the first column:

length(unique(df$Names))

and the total repeated number of categories in the second column:

length(which(df$Category== 1))

but this is not the result i am looking for.

smci
  • 29,564
  • 18
  • 109
  • 144
cplus
  • 1,065
  • 4
  • 22
  • 53

3 Answers3

1

Or aggregate in base R:

aggregate(Names ~ Category, data=df, FUN=function(x) length(unique(x)))
  Category Names
1        0     2
2        1     1
lmo
  • 36,904
  • 9
  • 50
  • 61
0

Using data.table

library(data.table)
setDT(df)[, .(Number =uniqueN(Names)), by = Category]
#    Category Number
#1:        1      1
#2:        0      2
akrun
  • 789,025
  • 32
  • 460
  • 575
-4

Using dplyr. You don't even need to manually get the unique Names first:

df <- data.frame(Names=c(rep('Jack',3),rep('Tom',2),rep('Sara',2)),
                 Category=c(1,1,1,0,0,0,0))
require(dplyr)

df %>% group_by(Category) %>% summarize(Number = n_distinct(Names))

  Category Number
     <dbl>  <int>
1        0      2
2        1      1

# and you can use as.data.frame(...) on that if you like

UPDATED: it was not clear OP's original wording they wanted to first group-by Category, then count number of distinct Names within each group.

smci
  • 29,564
  • 18
  • 109
  • 144