Tree search algorithm to find optimal terminal node

Question

I have a few board games where in each round one can do a set of action. Depending on the previous actions, the set of possible actions is different. Usually after a fixed amount of rounds, the game is finished and one computes the total number of points.

One such game is Nidavellir (draft blog post), where one plays eight rounds. The player has effectively has four integer-valued coins, starting with 2, 3, 4 and 5. In each round one can choose two of these, and the higher valued one gets replaced with the sum of the two set aside. The maximum value of a coin is 25. The final score is the sum of all coins.

Using a full breadth-first tree traversal I can iterate through these $6^8 = 1\,679\,616$ ways to play the game and find the winning strategy. It yields 58 points and each step has the following coins:

2, 3, 4, 5
2, 3, 5, 7
2, 5, 5, 7
2, 5, 7, 10
2, 7, 7, 10
2, 7, 10, 14
2, 7, 10, 24
2, 7, 17, 24
2, 7, 24, 25

There are other games where the complexity is rather in the order of $20^{30}$, so a full traversal doesn't work. Therefore I want to gather experience with algorithms that can traverse the tree a bit more efficiently.

In both cases I have tried beam search. In the Nidavellir case and with a beam size of 100, the best solution found only scores 55 points. The problem likely is that long-term benefits don't get accounted for properly. This is even more pronounced with Scythe, where the player has to spend resources (short-term loss) to gain achievements (long-term gain). The beam search, at least when used with the current end-game-score as a weight, doesn't perform so well there.

I have tried to use Q-learning, but I have a hard time modelling the Q-function for such a huge graph problem. For Scythe not all the actions are possible in each step, I am not sure how to model that sensibly.

Are there some other algorithms for this type of problem, where one can iteratively explore a huge graph and improve on solutions?

https://en.wikipedia.org/wiki/A*_search_algorithm, https://en.wikipedia.org/wiki/Monte_Carlo_tree_search, https://en.wikipedia.org/wiki/Evaluation_function, https://en.wikipedia.org/wiki/Iterative_deepening_depth-first_search — D.W., Nov 15 '21 at 08:04
@D.W.: Thanks for the pointers! A* doesn't feel applicable, as I don't know the terminal node, nor the distance. Monte Carlo tree search is promising, though its application doesn't seem trivial when there is no win/loss/draw outcome, but points. I still need an evaluation function. But it fares much better when using the current points and finds the optimal solution after exploring 33,000 of 1,100,000 possible final states. — Martin Ueding, Nov 15 '21 at 19:02
A* doesn't require that you know the distance. It requires that you construct an admissible heuristic. It doesn't require that there be a single terminal node. It does require you be able to identify which nodes are terminal, but that is easy in your case: each node should capture the full state of the game, which in your case includes the number of rounds remaining, and then every node with 0 rounds remaining is terminal. I think A* might be worth another look. — D.W., Nov 15 '21 at 21:33

Tree search algorithm to find optimal terminal node

0 Answers0