How do AI's learn to act when the problem space is too big

Question

I learn best through experimentation and example. I'm learning about neural networks and have (what I think) is a pretty good understanding of classification and regression and also supervised and unsupervised learning, but I've stumbled upon something I can't quiet figure out;

If I wanted to train an AI to play a complicated game; I'm thinking something like a RTS (eg. Age of Empires, Empire Earth etc.). In these types of games there is typically a number of entities controlled by the player (units, buildings) each with different capabilities. It seem like the problem of that the AI does would be classification (eg. choose that unit, and that action), however since the number of units is a variable how does one handle a classification problem in this way?

The only thing I can think of is multiple networks that do different stages (one for overall strategy, one for controlling this type of unit, one for that type of building etc.); but this seems like I'm making the problem to complicated.

Are there any good example of machine learning/neural networks learning complex games (not specifically RTS, but more complicated the Mario)?

"RTS AI: Problems and Techniques", http://webdocs.cs.ualberta.ca/~cdavid/pdf/ecgg15_chapter-rts_ai.pdf — Anton Tarasenko, Dec 12 '15 at 07:20
Could be useful towards an answer: http://ijcai.org/papers07/Papers/IJCAI07-168.pdf and review of same: http://aigamedev.com/open/review/transfer-learning-rts/ — Neil Slater, Dec 12 '15 at 08:34
Have you seen https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf ? — xgdgsc, Dec 19 '15 at 03:05

score 4 · Accepted Answer · answered Dec 25 '15 at 11:54

That is a good question and many scientists around the world are asking the same. Well, first a game like Age of Empires is not considered to have a really big solution space, there are not so many things you can do. It's the same in games like Mario Bros. The problem of learning in easy games like Atari games was solve by the guys of DeepMind (here the paper), that was acquired by Google. They used an implementation of Reinforcement Learning with Deep Learning.

Going back to your question. A really big problem is how to imitate the amount of decisions a human being takes every day. Wake up, have breakfast, take a shower, leave your house... All these action need a really high level of intelligence and many actions to develop.

There are many people working on this problem, I'm one of them. I don't know the solution but I can tell you in which way I'm looking. I follow the theories of Marvin Minsky, he is one of the fathers of AI. This book, the Emotion Machine, tells a very good view of the problem. He suggested that the way to create a machine that imitates the human behavior is not by constructing an unified compact theory of artificial intelligence. On the contrary, he argues that our brain contains resources that compete between each other to satisfy different goals at the same moment. They called this Ways to Think.

score 1 · Answer 2 · answered Nov 15 '18 at 16:38

Great question. This is a matter of complexity, and the approach you use will depend on how complex the problem is. Any problem we try to solve will have a degree of complexity associated with it, colloquially defined as "the number of things interacting, or things that need to be considered." In supervised and unsupervised learning we specify precisely the number of things to consider.

For example, in multiple linear regression we tell the learning algorithm how many features to consider when fitting a model (the number of columns in your training set). The same situation holds for unsupervised learning; a well-defined training set with an explicit number of features are used (in this case without labels).

What you are facing is a situation ill-suited for classification or regression, because you cannot specific precisely the number of "things to consider." As you say, your problem space is exceedingly large. Another way to think about this is in terms of the training set required to learn a model; how hard is it for you imagine what the training set looks like? In your case difficult. What exactly would the columns of my set contain?

This is why applications like self-driving cars, Atari, and AlphaGo do not use classification or regression. It's impossible to know what the training set would even look like. You can try, but your model will fail to reliably make strong predictions (in this case moves). How many things wold you have to consider to build a model of road conditions?

This is why a third type of machine learning, reinforcement learning, exists. Rather than use a pre-specified training set, it uses trial and error. By continually poking its environment it can learn a policy that works in the long term.

So, for smaller problem spaces where we stand a chance of defining the training set we use supervised and unsupervised machine learning. For larger problem spaces where it's difficult to define the training set we use reinforcement learning. Of course you can also make interesting combinations of all the above approaches, but it still comes down to complexity.

score 0 · Answer 3 · answered Mar 10 '21 at 02:21

This question was asked so long ago that I think it deserves a better response now. In general, for reinforcement learning, this problem is called the "sparse rewards problem". This Medium article covers the problem and some solutions, but you can find a lot more solutions just by running a search.

Instead of going into the details of algorithms that solve this problem, I'll instead point you to a few key successful applications of solutions to this problem along with short, general descriptions:

AlphaStar by Deepmind

StarCraft II is nearly exactly like Age of Empires, and AlphaStar was able to beat some of the best players in the world at the game.

Here, they created a tournament system for the AIs playing the game, and also generated different playing styles and strategies that the AIs would follow. Then, they used a genetic algorithm to modify the top AIs so that they slowly improved, making sure that the best AIs played against all different types.

OpenAI Five by OpenAI

(Here's a cool clip showing OpenAI Five outsmarting one of the top players in DOTA.)

OpenAI Five plays the game of DOTA, which is nearly exactly the same game as League of Legends. Here, they preprogrammed some of the more decision-tree options that would drastically change gameplay, such as what items the heroes buy, but then let the AI choose the rest of the options. (Though, note that the premade build options meant that it didn't understand / didn't expect some combos of heroes + items that you could play against it.)

To get the the five different heroes to play together, they let copies of the same AI control each of the heroes, and rewarded each of those AIs for how well the team did in addition to how well each individual hero did. This allowed the training to start at the individual level and then eventually expand to group play.

The OpenAI Five model is also nice because it's fairly general. They used the same training process in order to create Dactyl, which can manipulate a cube in a hand.

Overall

Other than these two, I haven't seen too many other AIs that play complex games with many different options for making moves. My guess is that it requires tons of compute power, and OpenAI and Deepmind are the only ones with that sort of compute power. This article argues that this may be due to the tree representation often times used by computer scientists.

How do AI's learn to act when the problem space is too big

3 Answers3