3

I was inspired by this awesome puzzle. Here is an image of a word tree borrowed from there:

enter image description here

In a word tree every path from the root to the leaves must form a distinct word. The size of the tree is the number of distinct words it forms in this way. The size of this tree is 16 as it forms 16 distinct words: BLOOM, BLOOD, BLOWN, BLOWS, BLAND, BLANK, BLASÉ, BLAST, BROOK, BROOM, BROWN, BROWS, BRAID, BRAIN, BRASH, BRASS. Note that words that do not reach the leaves are not counted, so for example we do not count the words BLOW, BROW and BRA. You must use words from this dictionary. Every branch must split into exactly two (or zero) other branches above it. The two branches may use the same letter. The leaves can be at different distances from the root. So we can extend the BRAIN branch into BRAINY and BRAINS, thus increasing the size of this tree to 17. What is the size of the largest word tree possible? You may use a computer to find the answer.

RobPratt
  • 13,685
  • 1
  • 29
  • 56
Dmitry Kamenetsky
  • 35,897
  • 5
  • 66
  • 276
  • 3
    This feels extremely derivative, with little thought towards the actual solving process. Which words are allowed? Can the two branches contain the same letter? Is there some puzzle element to this programming exercise? – Bass Jul 27 '22 at 06:38
  • Thank you both. I have added a dictionary of words. The two branches can contain the same letter, provided that the final words are all distinct. – Dmitry Kamenetsky Jul 27 '22 at 07:10
  • The puzzle element is how to solve this programming exercise that has a vast search space. I have some ideas, but I am looking forward to seeing your thoughts. Incremental solutions are acceptable. So currently we know that size 17 is possible. Can we get size 18? – Dmitry Kamenetsky Jul 27 '22 at 07:17
  • I am also not ruling out manually constructed solutions, which would involve a puzzle element. – Dmitry Kamenetsky Jul 27 '22 at 08:58
  • It is forbidden to use the same full path (root to leaf) word twice. – Dmitry Kamenetsky Jul 27 '22 at 09:40
  • For example, suppose you have RO. You can extend it with two O, to get ROOT, ROOK, ROOM and ROOF. – Dmitry Kamenetsky Jul 27 '22 at 09:49
  • 1
    A trivial extension: blasts and blasty. Also, why is your dictionary slightly out of order? – A username Jul 27 '22 at 10:29
  • Its a dictionary I found online. Happy to replace it with a better one. – Dmitry Kamenetsky Jul 27 '22 at 10:53

1 Answers1

7

I wrote a computer progam for this problem. It found the following tree:

This tree with root S contains 67 words.

squabbed
squabber
squabbled
squabbles
squabbly
squabs
squalled
squallers
squallery
squally
squalm
squeaked
squeakers
squeakery
squeaks
squeald
squeals
squeel
squeezed
squeezes
squeezy
squadded
squadder
squaddy
squads
squawks
squawky
squaws
squirms
squirmy
squirts
squirty
squished
squishes
squishy
squiss
stagers
stagery
stagey
stagged
staggers
staggery
staggy
starken
starker
starkle
starkly
startled
startles
startly
starty
striped
stripes
stripy
strived
striven
strivy
strobed
strobes
strobic
strobilae
strobilar
strobils
stropped
stropper
stroppy
strops

Note that this is not quite in alphabetical order because SQ has two children of letter U, the first with grandchildren A&E and the second with A&I.

The program works relatively straightforwardly. First it puts all the words in a tree structure. Then it prunes all the branches that have a node with only one branch. Then it prunes away all the smallest surplus branches until there are no more than two branches at each node. There is the slight complication that a node with 4 branches can be split into two branches with the same letter, so that has to be done first before pruning away.

Here's my C# program:

  using System.Collections.Generic;
  using System.Linq;
  using System.Text;

namespace TempProg { static class PSEWordTree { public class LetterNode { public char Letter; public List<LetterNode> Children = new List<LetterNode>();

       internal void Add(string word)
       {
          if( word==null || word.Length == 0) return;
          char firstLetter = word.First();
          LetterNode child = Children.FirstOrDefault(c =&gt; c.Letter == firstLetter);
          if (child == null)
          {
             child = new LetterNode();
             child.Letter = firstLetter;
             Children.Add(child);
          }
          child.Add(word.Substring(1));
       }

       // Prune any nodes with only one child
       internal void Prune1()
       {
          for (int i = Children.Count - 1; i &gt;= 0; i--)
          {
             var c = Children[i];
             c.Prune1();
             if (c.Children.Count == 1 &amp;&amp; Children.Count &gt; 1) Children.RemoveAt(i);
          }
       }

       // Prune surplus branches
       internal void Prune2()
       {
          for( int i=Children.Count - 1; i&gt;=0; i--)
          {
             var c = Children[i];
             if (c.Children.Count == 0) continue;

             c.Prune2();
             c.Children = c.Children.OrderByDescending(c2 =&gt; c2.NumLeaves()).ToList();
             if (c.Children.Count &gt; 4) c.Children.RemoveRange(4, c.Children.Count - 4);
             if (c.Children.Count == 4)
             {
                var c2 = new LetterNode();
                c2.Letter = c.Letter;
                c2.Children = c.Children.GetRange(2, 2).ToList();
                c.Children.RemoveRange(2, 2);
                Children.Insert(i + 1, c2);
             }
             else if (c.Children.Count == 3)
             {
                c.Children.RemoveAt(2);
             }
          }
       }

       internal int NumLeaves()
       {
          if (Children.Count == 0) return 1;
          return Children.Select(c =&gt; c.NumLeaves()).Sum();
       }

       internal List&lt;string&gt; GetWords()
       {
          if(Children.Count == 0)
          {
             return new List&lt;string&gt; { Letter.ToString() };
          }
          var w = Children.OrderBy(c =&gt; c.Letter).SelectMany(c =&gt; c.GetWords()).Select(s =&gt; Letter + s);
          return w.ToList();
       }

       public override string ToString()
       {
          StringBuilder s = new StringBuilder();
          foreach(var w in GetWords())
          {
             s.AppendLine(w);
          }
          return s.ToString();
       }

    }

    public static void Main()
    {
       var root = new LetterNode();

       foreach (string line in System.IO.File.ReadLines(@&quot;E:\temp\words_alpha.txt&quot;))
       {
          root.Add(line);
       }
       root.Prune1();
       root.Prune2();

       foreach (var c in root.Children)
       {
          System.Console.WriteLine(&quot;For letter {0} the tree has {1} words.&quot;, c.Letter, c.NumLeaves());
       }
       System.Console.WriteLine();

       foreach (var c in root.Children)
       {
          System.Console.WriteLine(&quot;For letter {0} the tree has {1} words.&quot;, c.Letter, c.NumLeaves());
          System.Console.WriteLine(c);
       }
    }

 }

}

Jaap Scherphuis
  • 53,019
  • 7
  • 120
  • 208
  • 1
    This is excellent work! Quite a large tree too. I assume this is optimal? Does the pruning order matter? – Dmitry Kamenetsky Jul 27 '22 at 12:53
  • 1
    @DmitryKamenetsky This should be optimal, barring any errors or bugs. When a node has say 3 branches, you want to prune away the one with fewest words/leaves. The number of words can only be accurately determined if all the sub-branches have been pruned to form a valid word tree (i.e. a binary tree). So the pruning must be performed from the leaves towards the root. The order of pruning amongst branches at the same level does not matter. I did the pruning in two stages (single branches first, then surplus branches) but only because I was having trouble combining those in a single pass. – Jaap Scherphuis Jul 27 '22 at 13:08
  • hang on I am not understanding something. Every word should have a partner word that has the same substring except the last letter. But the last two words in your list (stroppy and strops) don't have partners. Perhaps you are allowing some branches to have a single child? – Dmitry Kamenetsky Jul 27 '22 at 13:57
  • @DmitryKamenetsky To take your example: "The leaves can be at different distances from the root. So we can extend the BRAIN branch into BRAINY and BRAINS". This causes BRAID to be unpartnered since BRAIN is replaced by longer words, i.e. BRAID is now partnered with BRAIN. Similarly STROPS is partnered with STROPP. – Jaap Scherphuis Jul 27 '22 at 14:00
  • ah yes, silly me. You are right. We probably need to put your words into a diagram to see where they are. – Dmitry Kamenetsky Jul 27 '22 at 14:03
  • Can the result really be an odd number? Quote: Every branch must split into exactly two (or zero) [...]. – lukas.j Jul 27 '22 at 16:38
  • @lukas.j When you replace BRAIN by BRAINS and BRAINY the number of words increases by 1. – Jaap Scherphuis Jul 27 '22 at 17:16
  • @JaapScherphuis That is correct, but I do not understand the connection to my question. The consequence of the OPs requests are that leafs are always a pair of leafs. And because the OP also requests: Note that words that do not reach the leaves are not counted this means that the end result must be an even number of leaves. – lukas.j Jul 27 '22 at 17:31
  • @lukas.j In the example in the OP it says that you can extend BRAIN with two branches to make BRAINY and BRAINS. That still gives you a valid tree where every letter is either a leaf or has two branches coming off it. This replaces one word by two words. BRAIN is no longer counted, but BRAINS and BRAINY are. So you have a valid tree with 17 words, 15 five-letter words and 2 six-letter words. Note that BRAI still has two branches - one that leads to BRAID and one that splits again into BRAINS and BRAINY. – Jaap Scherphuis Jul 27 '22 at 18:03
  • @JaapScherphuis I got it now. Thanks for the explanation. I appreciated it. – lukas.j Jul 27 '22 at 18:09
  • I don't see how this is a valid solution. Each node is supposed to have exactly 2 branches, but squ has 3 branches (a, e, and i), and sq has only 1 branch (u). – Ray Butterworth Jul 28 '22 at 17:51
  • @RayButterworth As I explained below the word list, SQ has two branches, both U, and the top SQU branch splits into SQUA and SQUE and the bottom SQU branch splits into SQUA and SQUI. – Jaap Scherphuis Jul 28 '22 at 18:39
  • 1
    Does the program build a tree or a trie? XP – melfnt Jul 31 '22 at 16:21