I have a few board games where in each round one can do a set of action. Depending on the previous actions, the set of possible actions is different. Usually after a fixed amount of rounds, the game is finished and one computes the total number of points.
One such game is Nidavellir (draft blog post), where one plays eight rounds. The player has effectively has four integer-valued coins, starting with 2, 3, 4 and 5. In each round one can choose two of these, and the higher valued one gets replaced with the sum of the two set aside. The maximum value of a coin is 25. The final score is the sum of all coins.
Using a full breadth-first tree traversal I can iterate through these $6^8 = 1\,679\,616$ ways to play the game and find the winning strategy. It yields 58 points and each step has the following coins:
- 2, 3, 4, 5
- 2, 3, 5, 7
- 2, 5, 5, 7
- 2, 5, 7, 10
- 2, 7, 7, 10
- 2, 7, 10, 14
- 2, 7, 10, 24
- 2, 7, 17, 24
- 2, 7, 24, 25
There are other games where the complexity is rather in the order of $20^{30}$, so a full traversal doesn't work. Therefore I want to gather experience with algorithms that can traverse the tree a bit more efficiently.
In both cases I have tried beam search. In the Nidavellir case and with a beam size of 100, the best solution found only scores 55 points. The problem likely is that long-term benefits don't get accounted for properly. This is even more pronounced with Scythe, where the player has to spend resources (short-term loss) to gain achievements (long-term gain). The beam search, at least when used with the current end-game-score as a weight, doesn't perform so well there.
I have tried to use Q-learning, but I have a hard time modelling the Q-function for such a huge graph problem. For Scythe not all the actions are possible in each step, I am not sure how to model that sensibly.
Are there some other algorithms for this type of problem, where one can iteratively explore a huge graph and improve on solutions?