Hobo AI: November 2018

Thursday, 29 November 2018

Seed Recall

Ok so its time to fess up about what Ive discovered. OMG. Seed recall is a discovery I made whilst experimenting with the RLnet - Neural Q Learner.

I noticed that the solution depened on the random seed. So I decided to test the theory that the randomly perturbed activation functions themselves encoded different solutions (and heres the big one) for the same prelearnt weights.

Heres what I did:

Learning:
seed random generator
init random weights or load previous weights
create seed during learning - using Random Decay (See previous post)
backpropogate solution 178 iterations
save seed
save weights

repeat for all solutions

Recall:
set random generator to null
load saved weights
load seed for solution - Apply seed directly to Activation function
backpropogate        58 iterations
view seed dependant solution

For each seed I got a unique solution. Ofcourse its early days but it may just be that different solutions to the same problem can be learnt and encoded by the same neural network using the different random seeds applied to the activation function for each neuron.

Whether this extends to other problems I have yet to discover. If the problems are similar and have the same dimensions then it is possible.

The problem I am using is that detailed in previous posts:

{{-1, -1, -1,-1, 0,-1},
            {-1, -1, -1, 0, -1, 1},
            {-1, -1, -1, 0, -1,-1},
            {-1, 0, -1,-1, 0,-1},
            {-1, -1, -1, 0, -1, 1},
            {-1, 0, -1,-1, 0, 1}};
6 States with 6 Actions

I now plan to Upscale the RLnet to do 120x120 and use a CNN layer.

Cumin Tikka

Invented new dish calling it CuminTikka. Marinade chicken in yoghurt, cumin,corriander,paprika,garam masala, ginger and garlic. Overnight.
Roast chicken marinades.( Or You can just add the Marinades without cooking them)
Add cumin 2tsp, freshly chopped mint to onions cook down.
Brown blanched potatoes with butter,herbs and garlic.
Add tomatoe to onions, add stock, and lemon juice.
Add all to sauce and then to casserole.
It is very refreshing the cumin, mint and lemon sauce is delicious and fresh, the marinades are spicey and fun.
CuminTikka.

You can roast the marinades and then add them, or add them without roasting or frying. The flavour locks into the chicken from the marinade. Even with adding marinade to the sauce the lovely fresh taste of the cumin-mint still comes through.

Monday, 19 November 2018

Finally the Quest for the Holy grail of (My)AI is nearly over...

Since 2007 I had begun a quest to train a neural network to learn
the Bellman Residual - the equation used in Reinforcement Learning (Q learning).

As the years went by Deep Learning took over the quest and completed it (and some!!).

However the Quest was still ellusive to me and so I went off and did something else with my spare time.

But recently I returned to the quest. I dry docked an MLP neural network and benchmarked it on Xor. Then I set about applying it to the problem specified here:

{{-1, -1, -1, -1, 0, -1},
            {-1, -1, -1, 0, -1, 1},
            {-1, -1, -1, 0, -1, -1},
            {-1, 0, -1, -1, 0, -1},
            {-1, -1, -1, 0, -1, 1},
            {-1, 0, -1, -1, 0, 1}};

Each of the 6 array above is a state with 6 actions. It is a Qlearning problem used by Mnemstudio.

So as you may have read in a previous blog I tried using Mnemstudio's XOR Backpropogating Neural Network. But alas I knew it needed another layer.

I tried adding another hidden layer, a CNN layer all to no avail. Until I rewrote it completely and applied my own learning rules.

I also cheated and used a RANDOM DECAY routine that decreased with Error. That means that while it learnt its TRANSFER FUNCTION was slightly random meaning that it could jump out of SUBOPTIMAL SOLUTIONS and find new ones.

With 4 layers : INPUT 6 X NEURONS WITH NORMALISED DATA
OUTPUT 6 X NEURONS WITH SIGMOID ACTIVATIONS
HIDDEN LAYERS X 2 WITH 18 NEURONS WITH TANH AND SIGMOID

For this algo the CNN option doesnt help for this problem however I am hoping it will work on Pixel to action problems. (Where the input is a image and the output is an action)

What was interesting is that the neural network converges to different solutions where as the Q learner stays in one solution - which I think is the purpose of using a Neural Network for RL. Its power is that it can find solutions where the Q learner cannot.

Q test output
1, 5
3, 4, 5
5, 5
2, 3, 4, 5
4, 5
0, 4, 5
Q Neural test output
1, 5
3, 1, 5
5, 4, 5
2, 3, 1, 5
4, 5
0, 4, 5

Two different solutions above both equally valid.

How did we get an error curve like that!!!

Sneak a peak @ Shepherds latest works DarkSide - Chapter 2 - The Watchers

Or Read Kromos now