Issue with simple game AI

Question

A few months ago I made a simple game that is similar to the dinosaur game in Google Chrome - you jump over obstacles, or don't jump over levitating obstacles, and jump to collect bitcoins, which can be placed at 5 different heights. I used a very lightweight NN written by NYU professor Dan Shiffman, and within a few days the game and AI were done, starting off with a population of 200 jumpers, and a genetic algorithm (fitness function (points are given for avoiding obstacles and gathering bitcoins) and mutation), and it worked as it should.

However, this was only when the bitcoins and obstacles were not near each other, which I've been struggling with ever since.

So, I made a "training ground" where I put first a levitating obstacle, then a grounded one, and then a bitcoin after it, and then a bitcoin above a fourth grounded obstacle, and no matter how many times and how long I'd leave it to train, I'd always end up with identical behavior:

The first 3 obstacles are properly avoided, the first bitcoin is collected, and then jumpers would jump too early, land before the fourth "bitcoin" obstacle, and jump again, always crashing at almost the same place (across all generations, so even if I'd restart the training, they crash at the same place in the obstacle, with a deviation of a few pixels up or down). I added multilayer support to the NN, no improvements.

Today I replaced the NN with tensorflow.js, and I am getting identical behaviour.

My inputs are:

distance to next obstacle
altitude of next obstacle
distance to next star

(for simplicity I removed the altitude of stars from the input, and keep them at a constant altitude)

I have 2 hidden layers (5 and 6 neurons), and 1 neuron in the output, which determines if the jumper should jump.

My only idea is that a neuron that decides when to jump because of the obstacle activates alongside the neuron that decides when to jump because of the bitcoin, their weights are summed up and a decision to jump too early is made.

I'll give somewhat of a (maybe bad) analogy:

If it takes you 1 month to prepare an exam, then, if you have 2 exams on the same day, you will start preparing them 2 months earlier. That logic works in this case, but not in my AI.

In the initial "toy neural network" I even added 8 layers of 12 neurons each, which I think is overkill for this case. In tf.js I used both sigmoid and relu activation functions. No matter what I did, no improvement.

Hope someone has an idea where I'm going wrong.

Did you try putting only one grounded obstacle and one neuron on top of it in an enviornment? Does it have a similar behavior as described here? — Abhishek Mishra, Jun 12 '18 at 04:37
Are you feeding any temporal information into your neural network? What is the input going to the neural net? — Abhishek Mishra, Jun 12 '18 at 04:40
Each jumper has its own NN and the inputs are in the bullet list above. For your first question - in a way it has similar behavior. It learns to jump over that obstacle, it can also jump to get a bitcoin on its own, but cant learn to jump over a solo obstacle. It looks like it disregards the obstacle mostly and its actions are determined mostly by the distance to the next bitcoin — Dejan Biljecki, Jun 12 '18 at 07:27

Douglas Daseeco · Answer 1 · 2018-08-16T13:16:30.807

The issue is likely in how you estimate wellness, how the error function is constructed and from what data, since you have used two known good pieces of software and probably known good derivatives for your activation functions. The second most likely is in now the components of wellness is aggregated. Summing squares is sometimes not representative of a solid aggregation strategy.

I'm a bit confused about the game for three reasons.

Stars enter the description without telling us whether they are not obstacles, are obsticles, or the only obstacles
At one point bitcoins collection is the objective and obstacles are the challenge and at another point bitcoins are obstacles themselves
Your inputs have distance but not direction (Is this 2-D game?)

Based on textual hints, I'm going to assume five simple things, and you can correct any misconceptions I list.

Collecting all bitcoins is part of the objective
Running into bitcoins mid-jump is considered a crash
There is some radial tolerance to landing in at a bitcoin location with some imprecision
There are other things to crash into
Your outputs are jump magnitude and direction

I see that when the bitcoins are not in the vicinity of other things, the net can be trained effectively to jump to them, but when in the vicinity of other things, the training converges on a behavior that repeatedly fails prior to completion. I'm assuming that the failure location is not 100% repeatable because the genetic algorithm has a pseudo-random seed that changes. Again, correct me if I misunderstand in my piecing together the scenario.

One should consider the possibility that the distance to non-bitcoin obstacles is part of the error function and the difference between the jump destination and the bitcoin is also part of the error function. (This second one is why I prefer to call the error contour a wellness measure.)

If the bitcoin incentive does not seem to be the crash cause, in that, had the other obstacle been moved, the bitcoin would have been collected, then the first of the two wellness criteria needs a higher order contribution from the distance to the other obstacle.

There are two simple functional forms that come to mind, which could be tried that increase the alarm represented in back propagation when the probability of collision is heightened to more effectively train against collision. Both involve determining the direct jump line to the nearest bit coin and the distance from that line to its nearest other obstacle; call that x.

$x^y$, where $y > 1.0$
$e^{kx}$

Issue with simple game AI

1 Answers1