Find most common word in a character string

Question

I have a character string and need to find the word in the string that occurs most frequently. I've tried every variation of max, which.max, sort, order, and rank that I can think of - but can't seem to get the syntax worked out correctly. I've also tried all of the methods found here: Calculate frequency of occurrence in an array using R

Example code:

zzz <- c("jan", "feb", "jan", "mar", "mar", "jan", "feb") #random example data
zzz <- paste(zzz, collapse=" ") #make data look like what I'm working with
zzz
# [1] "jan feb jan mar mar jan feb"

I this example, "jan" occurs most frequently.

Any suggestions are greatly appreciated!

score 3 · Accepted Answer · answered Oct 19 '14 at 20:55

3

How about this:

Freq <- table(unlist(strsplit(zzz," ")))
# > Freq
# feb jan mar 
# 2   3   2 
> Freq[which.max(Freq)]
jan 
  3

If you just want the actual word as output,

> names(Freq)[which.max(Freq)]
[1] "jan"

answered Oct 19 '14 at 20:55

nrussell

17,956
4
46
60

1

Perfect! Thank you. :) – jdfinch3 Oct 19 '14 at 21:15
You might want to worry about ties too. In this example, the first element of the table (and it's alphabetical, if memory serves), will always be selected in case of a tie. – pdb Oct 20 '14 at 02:50
What's to worry about? They either exist or they don't. – nrussell Oct 20 '14 at 11:20

Rich Scriven · Answer 2 · 2014-10-19T21:26:12.820

2

You could also factor the split vector then tabulate.

f <- factor(strsplit(zzz, " ")[[1]])
levels(f)[which.max(tabulate(f))]
# [1] "jan"

edited Oct 19 '14 at 21:26

answered Oct 19 '14 at 21:00

Rich Scriven

93,629
10
165
233

Find most common word in a character string

2 Answers2

Related