1

I'm sure most people are familiar with word grid games like Boggle and the newer digital versions Scramble with Friends and Ruzzle.

4x4 word grid

For anyone not familiar, the idea is to find words by using adjacent tiles. You start from any cell and try to spell a word by dragging up, down, left, right, or diagonal. The board doesn't wrap, and you can't reuse letters you've already selected.

I'm trying to figure out the likelihood of a word appearing on a board given the likelihood of knowing how often individual letters appear. For example, if I know that the letter A appears 9.8% of the time, what is the probability of seeing the word AA?

I know this is fairly simple, but it's been too long since college stats. (I have two hokey models, but I'd like to hear from the experts.) I could run a simulation of a million boards and come up with an empirical answer--in fact, someone has--but I'd rather understand why that is. In order to make things simpler, I'd like to ignore two rules that add to the complexity:

  • We can ignore the constraints of how many times a letter can appear on a board. e.g., don't worry about whether there will be enough E's to make the word ELECTEE.
  • We can ignore the fact that letters have to be adjacent. This means we don't have to figure out the likelihood that a letter is a neighbor of another letter that we need for the word

So with that being said, and with the following probabilities

  • A: 0.098
  • E: 0.146
  • R: 0.079
  • S: 0.102
  • T: 0.098
  • C: 0.021

What is the likelihood of a random board containing the word SEE? What about SET? TEAR? What about the rarer CREATES?

NOTE: I did search here already and I think this is a similar but different problem than this question: Probability of drawing a given word from a bag of letters in Scrabble

Dan
  • 11
  • When you ignore the adjacency constraint, you are just re-asking the Scrabble questions: the only changes to make are to increase the rack from 7 to 16 letters, limit the dictionary of valid words appropriately (e.g., have it contain only the target word), and possibly to change the letter frequencies a little. Note, too, that the Scrabble answers already handle the limited number of copies of each letter. Would you like therefore to modify your question so it will not be closed as a duplicate? – whuber May 27 '14 at 18:25
  • I believe there are a few other key differences. For one, there are no wildcards (blanks) in this question. Also, a Scrabble bag has a finite amount of tiles and picking one affects the pool of remaining tiles. There is no such restriction here. Is that sufficient to be considered a unique question? – Dan May 27 '14 at 19:01
  • The lack of wildcards is a simple special case of the situation with wildcards. Your sampling of tiles with replacement is a slight difference and, I suppose, on top of all the other differences it makes the question unique (and relatively easy to answer). I would wonder, though, at how relevant any answer will be to the actual games you reference because the simplifications you are making--especially the one about adjacency not mattering--change the actual probabilities by extraordinary amounts. – whuber May 27 '14 at 19:06
  • Well, if anyone wants to take a stab a the solution with BOTH adjacency and without, I certainly won't stop them. :) – Dan May 27 '14 at 21:21
  • The adjacency question is very complicated to answer, because it depends on (a) the degree to which letters are repeated within the target word and (b) the geometry of valid paths within the board (and the fashions in which those paths might overlap each other). As suggested in many of the answers to the Scrabble questions, as well as in your question itself, (well-crafted) simulations are likely the most practicable approaches if all that is needed is the numerical value of the probability. – whuber May 27 '14 at 21:28

0 Answers0