LSTM params 및 code 정리

LSTM

1. params 설명


(img link: https://sergioskar.github.io/Bitcon_prediction_LSTM/)

  • $c_t​$: 과거로부터 시각 t까지의 모든 정보 저장. 기억 셀.
  • $h_t$: 기억셀 $c_t$를 tanh 함수 적용
  • o: output 게이트. 다음 은식 상태 $h_t$산출. 입력 $x_t, h_{t-1}$을 받음.
    • formula: $o = \sigma(x_t W_{xo} + h_{t-1} W_{ho}+b_o)$
    • return: $h_t = o \odot tanh(c_t)​$
  • f: forget 게이트. 기억셀 $c_t$ 산출. $x_t, h_{t-1}​$을 받음.
    • formula: $f = \sigma(x_t W_{xf} + h_{t-1} W_{hf} + b_f)$
    • return: $c_t = f \odot c_{t-1}​$
  • g: tanh 노드. 새로 기억해야 할 정보를 ‘기억셀’에 추가. -> 위 그림에서 $\tilde{c}_t$
    • formula: $g = tanh(x_t W_{xg} + h_{t-1} W_{hg} + b_g)​$
  • i: input 게이트. 새로 기억해야 할 정보(g의 결과물)의 중요도 판별=가중치.
    • formula: $i = \sigma(x_t W_{xi} + h_{t-1} W_{hi} + b_i)$
  • params 정리
    $f = \sigma(x_t W_{xf} + h_{t-1} W_{hf} + b_f)$
    $g = tanh(x_t W_{xg} + h_{t-1} W_{hg} + b_g)$
    $i = \sigma(x_t W_{xi} + h_{t-1} W_{hi} + b_i)$
    $o = \sigma(x_t W_{xo} + h_{t-1} W_{ho}+b_o)$
    $c_t = f \odot c_{t-1}$
    $h_t = o \odot tanh(c_t)​$

2. code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# original code link: https://github.com/ma2rten/seq2seq/blob/master/seq2seq/seq2seq.py
def forward(self, x_t):
self.t += 1

t = self.t
h = self.h[t-1]

self.forget_gate[t] = \
sigmoid(np.dot(self.W_hf, h) + np.dot(self.W_xf, x_t) + self.b_f)
self.input_gate[t] = \
sigmoid(np.dot(self.W_hi, h) + np.dot(self.W_xi, x_t) + self.b_i)
self.output_gate[t] = \
sigmoid(np.dot(self.W_ho, h) + np.dot(self.W_xo, x_t) + self.b_o)

self.cell_update[t] = \
tanh(np.dot(self.W_hj, h) + np.dot(self.W_xj, x_t) + self.b_j)

self.c[t] = \
self.forget_gate[t] * self.c[t-1] + self.input_gate[t] * self.cell_update[t]
self.h[t] = self.output_gate[t] * tanh(self.c[t])

self.x[t] = x_t
return self.h[t]

def backward(self, dh):
t = self.t

dh = dh + self.dh_prev
dC = tanh_grad(self.ct[t]) * self.output_gate[t] * dh + self.dc_prev

# gate backprop
d_input = sigmoid_grad(self.input_gate[t]) * self.cell_update[t] * dC
d_forget = sigmoid_grad(self.forget_gate[t]) * self.c[t-1] * dC
d_output = sigmoid_grad(self.output_gate[t]) * self.tanh(self.c[t]) * dh
d_update = tanh_grad(self.cell_update[t]) * self.input_gate[t] * dC

self.dc_prev = self.forget_gate[t] * dC

# bias backprop
self.db_i += d_input
self.db_f += d_forget
self.db_o += d_output
self.db_j += d_update

h_in = self.h[t-1]

self.dW_xi += np.outer(d_input, self.x[t])
self.dW_xf += np.outer(d_forget, self.x[t])
self.dW_xo += np.outer(d_output, self.x[t])
self.dW_xj += np.outer(d_update, self.x[t])

self.dW_hi += np.outer(d_input, h_in)
self.dW_hf += np.outer(d_forget, h_in)
self.dW_ho += np.outer(d_outer, h_in)
self.dW_hj += np.outer(d_update, h_in)

self.dh_prev = np.dot(self.W_hi.T, d_input)
self.dh_prev += np.dot(self.W_hf.T, d_forget)
self.dh_prev += np.dot(self.W_ho.T, d_output)
self.dh_prev += np.dot(self.W_hj.T, d_update)

dX = np.dot(self.W_xi.T, d_input)
dX += np.dot(self.W_xf.T, d_forget)
dX += np.dot(self.W_xo.T, d_output)
dX += np.dot(self.W_xj.T, d_update)

self.t -= 1

return dX
< !-- add by yurixu 替换Google的jquery并且添加判断逻辑 -->