Sunday, 11 June 2017

Long Term Short Memory LTSM or FICO for short.

The anacronym FICO stands for Forget Include Copy Out these are the gates for the LTSM module based on the infamous LSTM module.

These are neurons with gates controlled by weights with sigmoid activation functions. An Input(Xt) is combined with the previous output of the last cell (Hin) or Hout at t-1. The gates determine whether the previous cells Activation (Ctin) is Forgotten or not (F - Forget Gate). They also determine whether the Input combine with Hin is included in computing the Activation(I - Include gate) of the unit and whether the Activation is passed to Hout (O - Output gate).



1. Include Previous Output?
ht-1 + Xt -> sigmoid -> forget 1/0 -> Adds PreOutput to Output(Ct)

ft = sigmoid(Wf.[ht-1,xt] + bf

2. Include new Input?
ht-1 + xt -> sigmoid -> include 1/0 -> Adds tanh(Input+PreHidden) to Output(Ct)

It = sigmoid(Wi.[ht-1,xt] + bi
~Ct = tanh(Wc.[ht-1,xt] +bc)
Ct = ft*Ct-1 + it*~Ct

3. Include Output as Hidden?
ht-1 + xt -> sigmoid -> include 1/0 -> Adds tanh(Output) to Hidden(ht)

Ot = sigmoid(Wo.[ht-1,xt] + bo
ht = Ot * tanh(Ct)

Four Weights: Wf  Wi  Wc  Wo  or F.I.C.O for short!

I connected everything up in this way and got it to learn an Identity function for one module and I am poised to use it on a time series which I have read is a good application for the LSTM.

I have found however that for Identity - a learnt mapping from input to output sigmoids insteads of the tanh functions work better.

Also for the Identity mapping test I fed back the previous Outputs to their Inputs Cout -> Cin  Hout -> Hin for Xt = X.

We shall see if it works out of the box for time series will it learn how to count and will it predict a sequence of numbers.

I have written the whole unit using vectors making it suitable for taking inputs and producing outputs for a convolutional process. Making it perhaps suitable for image processing.

The std::vector object is simply a faster and easier way of processing 2d arrays.

Also I have made the Weight arrays flexible to being 1d or 2d.  So its all go for Long Term Short Memeory!!