Hobo AI: Finally the Quest for the Holy grail of (My)AI is nearly over...

Since 2007 I had begun a quest to train a neural network to learn
the Bellman Residual - the equation used in Reinforcement Learning (Q learning).

As the years went by Deep Learning took over the quest and completed it (and some!!).

However the Quest was still ellusive to me and so I went off and did something else with my spare time.

But recently I returned to the quest. I dry docked an MLP neural network and benchmarked it on Xor. Then I set about applying it to the problem specified here:

{{-1, -1, -1, -1, 0, -1},
            {-1, -1, -1, 0, -1, 1},
            {-1, -1, -1, 0, -1, -1},
            {-1, 0, -1, -1, 0, -1},
            {-1, -1, -1, 0, -1, 1},
            {-1, 0, -1, -1, 0, 1}};

Each of the 6 array above is a state with 6 actions. It is a Qlearning problem used by Mnemstudio.

So as you may have read in a previous blog I tried using Mnemstudio's XOR Backpropogating Neural Network. But alas I knew it needed another layer.

I tried adding another hidden layer, a CNN layer all to no avail. Until I rewrote it completely and applied my own learning rules.

I also cheated and used a RANDOM DECAY routine that decreased with Error. That means that while it learnt its TRANSFER FUNCTION was slightly random meaning that it could jump out of SUBOPTIMAL SOLUTIONS and find new ones.

With 4 layers : INPUT 6 X NEURONS WITH NORMALISED DATA
OUTPUT 6 X NEURONS WITH SIGMOID ACTIVATIONS
HIDDEN LAYERS X 2 WITH 18 NEURONS WITH TANH AND SIGMOID

For this algo the CNN option doesnt help for this problem however I am hoping it will work on Pixel to action problems. (Where the input is a image and the output is an action)

What was interesting is that the neural network converges to different solutions where as the Q learner stays in one solution - which I think is the purpose of using a Neural Network for RL. Its power is that it can find solutions where the Q learner cannot.

Q test output
1, 5
3, 4, 5
5, 5
2, 3, 4, 5
4, 5
0, 4, 5
Q Neural test output
1, 5
3, 1, 5
5, 4, 5
2, 3, 1, 5
4, 5
0, 4, 5

Two different solutions above both equally valid.

How did we get an error curve like that!!!

Sneak a peak @ Shepherds latest works DarkSide - Chapter 2 - The Watchers

Or Read Kromos now

Hobo AI

Monday, 19 November 2018

Finally the Quest for the Holy grail of (My)AI is nearly over...

No comments:

Post a Comment