Wednesday 26 July 2017

LSTM with no learning - recall and count ahead.

Without any training my LSTM remembers for a specified delay and counts ahead it also has a go at predicting the future for some rudimentary time series.

But its not been trained because the wgt updates dont affect this output it does all this off the cuff!

Whats going on. I hope I can fix this.

Oh the memory thing is affected by the training to specify when to recall. I am uploading this doozy for anyone to play with and maybe get it to do the right thing!

This upload remembers stuff you can specify when to recall by seting ahead=1,2,3 steps ahead.

This upload counts ahead.

All with no learning - whats going on!

Here is roughly what Ive implemented:


 Forward Pass

Act_FGate = ft = sigmoid_(Wgt[0]*(Hin+In)+Bias[0],1,0);        //forget gate
Act_IGate = It = sigmoid_(Wgt[1]*(Hin+In)+Bias[1],1,0);        //Include Gate
Ct_= tanh_(Wgt[2]*(Hin+In)+Bias[2],1,0);
Act_CGate = Ct = ft*Ctin+It*Ct_;
                        //Out gate
Act_OGate = Ot = sigmoid_(Wgt[3]*(Hin+In)+Bias[3],1,0);
Hout = Ot * tanh(Ct);                //Outputs

Backward Pass

***Backprop error:

Hout_Err = Out - Hout

Ctin_Err  = Inv_Tanh(Wgt_O * Hout_Err)

Err_FGate = Inv_Sigmoid(Wgt_F * Hout_Err)

Err_IGate = Inv_Sigmoid(Wgt_I * Hout_Err)

Err_CGate = Inv_Tanh(Wgt_C * Hout_Err)

Hin_Err = Err_CGate + Err_IGate + Err_FGate

Next layer down Hout_Err = Hin_Err

***Update Wgts (For each Wgt_F,Wgt_I,Wgt_C,Wgt_O):

WgDelta = (Hin+In)*Err_Gate*Act_Gate*Lrt + Momentum*PreDelta - Decay*PreWgt

PreWgt = Wgt
PreDelta = WgtDelta
Wgt += WgtDelta

Here is the correct psuedo code. Borrowed / Interpreted from a translation online:

Forward

PreAct_FGate = U_FGate*(Hin+In) + W_FGate*Hout(t-1) + V_FGate*C(t-1)
PreAct_IGate = U_IGate*(Hin+In) + W_IGate*Hout(t-1) + V_IGate*C(t-1)

PreAct_CGate = U_CGate*(Hin+In) + W_CGate*Hout(t-1)

Act_IGate = Sigmoid(PreAct_IGate)
Act_FGate = Sigmoid(PreAct_FGate)
Ct_       = LSigmoid(PreAct_CGate)

Ct = Act_FGate * Ct-1 + Act_IGate * Ct_

PreAct_OGate = U_CGate*(Hin+In) + W_CGate*Hout(t-1) + V_OGate*C(t)

Act_OGate = Sigmoid(Act_OGate)

Hout = Act_OGate *tanh(Ct)



Backpass:


Hin_Err = Sum U_Gate*Err_Gate(t) + Sum W_Gate*Err_Gate(t+1)   <---- For Layer Above

Err_OGate(t) = Inv_Sig(PreAct_OGate(t))*tanh(Ct)*Hout_Err(t)

Ct_Err(t) = Act_OGate*Inv_Tanh(PreAct_CGate(t))*Hout_Err(t)

+ Act_FGate(t+1)*Ct_Err(t+1) + WgtV_I*Err_IGate(t+1)

+ WgtV_F*Err_FGate(t+1) + WgtV_O*Err_OGate(t)

Err_CGate = Inv_LSig(PreAct_CGate(t))*Act_IGate(t)*Ct_Err(t)

Err_FGate = Inv_Sig(PreAct_FGate(t))*Act_CGate(t-1)*Ct_Err(t)

Err_IGate = Inv_Sig(PreAct_IGate(t))*LSig(PreAct_CGate)*Ct_Err(t)


*(Hout_Err(prev) = Hin_Err
  Hout_Err = Out - Hout   )
 
*(Three Activation Functions
Logistic Sigmoid, Tanh, Sigmoid)

*(Three Wgts - U,W,V)




No comments:

Post a Comment