Without any training my LSTM remembers for a specified delay and counts ahead it also has a go at predicting the future for some rudimentary time series.
But its not been trained because the wgt updates dont affect this output it does all this off the cuff!
Whats going on. I hope I can fix this.
Oh the memory thing is affected by the training to specify when to recall. I am uploading this doozy for anyone to play with and maybe get it to do the right thing!
This upload remembers stuff you can specify when to recall by seting ahead=1,2,3 steps ahead.
This upload counts ahead.
All with no learning - whats going on!
Here is roughly what Ive implemented:
Forward Pass
Act_FGate = ft = sigmoid_(Wgt[0]*(Hin+In)+Bias[0],1,0); //forget gate
Act_IGate = It = sigmoid_(Wgt[1]*(Hin+In)+Bias[1],1,0); //Include Gate
Ct_= tanh_(Wgt[2]*(Hin+In)+Bias[2],1,0);
Act_CGate = Ct = ft*Ctin+It*Ct_;
//Out gate
Act_OGate = Ot = sigmoid_(Wgt[3]*(Hin+In)+Bias[3],1,0);
Hout = Ot * tanh(Ct); //Outputs
Backward Pass
***Backprop error:
Hout_Err = Out - Hout
Ctin_Err = Inv_Tanh(Wgt_O * Hout_Err)
Err_FGate = Inv_Sigmoid(Wgt_F * Hout_Err)
Err_IGate = Inv_Sigmoid(Wgt_I * Hout_Err)
Err_CGate = Inv_Tanh(Wgt_C * Hout_Err)
Hin_Err = Err_CGate + Err_IGate + Err_FGate
Next layer down Hout_Err = Hin_Err
***Update Wgts (For each Wgt_F,Wgt_I,Wgt_C,Wgt_O):
WgDelta = (Hin+In)*Err_Gate*Act_Gate*Lrt + Momentum*PreDelta - Decay*PreWgt
PreWgt = Wgt
PreDelta = WgtDelta
Wgt += WgtDelta
Here is the correct psuedo code. Borrowed / Interpreted from a translation online:
Forward
PreAct_FGate = U_FGate*(Hin+In) + W_FGate*Hout(t-1) + V_FGate*C(t-1)
PreAct_IGate = U_IGate*(Hin+In) + W_IGate*Hout(t-1) + V_IGate*C(t-1)
PreAct_CGate = U_CGate*(Hin+In) + W_CGate*Hout(t-1)
Act_IGate = Sigmoid(PreAct_IGate)
Act_FGate = Sigmoid(PreAct_FGate)
Ct_ = LSigmoid(PreAct_CGate)
Ct = Act_FGate * Ct-1 + Act_IGate * Ct_
PreAct_OGate = U_CGate*(Hin+In) + W_CGate*Hout(t-1) + V_OGate*C(t)
Act_OGate = Sigmoid(Act_OGate)
Hout = Act_OGate *tanh(Ct)
Backpass:
Hin_Err = Sum U_Gate*Err_Gate(t) + Sum W_Gate*Err_Gate(t+1) <---- For Layer Above
Err_OGate(t) = Inv_Sig(PreAct_OGate(t))*tanh(Ct)*Hout_Err(t)
Ct_Err(t) = Act_OGate*Inv_Tanh(PreAct_CGate(t))*Hout_Err(t)
+ Act_FGate(t+1)*Ct_Err(t+1) + WgtV_I*Err_IGate(t+1)
+ WgtV_F*Err_FGate(t+1) + WgtV_O*Err_OGate(t)
Err_CGate = Inv_LSig(PreAct_CGate(t))*Act_IGate(t)*Ct_Err(t)
Err_FGate = Inv_Sig(PreAct_FGate(t))*Act_CGate(t-1)*Ct_Err(t)
Err_IGate = Inv_Sig(PreAct_IGate(t))*LSig(PreAct_CGate)*Ct_Err(t)
*(Hout_Err(prev) = Hin_Err
Hout_Err = Out - Hout )
*(Three Activation Functions
Logistic Sigmoid, Tanh, Sigmoid)
*(Three Wgts - U,W,V)
No comments:
Post a Comment