cosine annealing implementation

2020-4-11 Sat 11:01

Math►DL

1. Learning rate warm up

train 초기에 learning rate를 서서히 증가시키는 것
warm up 사용시, 설명력이 강한 변수에 모델이 overfit하는 것을 방지할 수 있음.

2. consine annealing

2.1. cosine 함수 기본 성질

$y = a \times sin{(b x)} + c$
- 주기: $\dfrac{2 \pi}{|b|}$
- 최대, 최소값: |a| + c, -|a| + c

2.2. cosine annealing

$lr_{current} = \dfrac{lr_{max}}{2} \times (\cos{ ( \pi \cdot \dfrac{\text{mod} (\text{current epoch} -1, [ \text{total epochs} / \text{num of cycle}] )}{[ \text{total epochs} / \text{num of cycle}] }} ) +1 )$

num of cycle: total epochs에서 주기가 반복되는 횟수
total epochs/num of cycle: 한 주기를 이루는 epoch 횟수
- eg. 100 / 5: 내리락 주기 5번, 주기 1번에 epoch 20회
x에 해당하는 것은 current epoch인데, 실제로는 current epoch % (total epochs/num of cycle).
- 따라서 0 <= x < (total epochs/num of cycle)
주기: $2 \pi / \text{inner cosine}$ (inner cosine = pi / (epochs per cycle))
- 결국 주기는, epochs per cycle
- eg. total epochs: 100, cycle: 5
  - total epochs/num of cycle: 20(constant)
  - 여기서 mod 결과값은 x / 20했을 때 나머지
  - $lr_{current}=\dfrac{lr_{max}}{2} \times (\cos{\pi \cdot \dfrac{0}{20}} + 1), \dfrac{lr_{max}}{2} \times (\cos{\pi \cdot \dfrac{1}{20}} + 1), \dfrac{lr_{max}}{2} \times (\cos{\pi \cdot \dfrac{2}{20}} + 1), \cdots$
최대, 최소: $\dfrac{lr_{max}}{2} + \dfrac{lr_{max}}{2} = lr_{max}, -\dfrac{lr_{max}}{2} + \dfrac{lr_{max}}{2} = 0$

3. 예시 code

from math import pi
from math import cos
from math import floor

def cosine_annealing(epoch, n_epochs, n_cycles, lrate_max):
	epochs_per_cycle = floor(n_epochs/n_cycles)
	cos_inner = (pi * (epoch % epochs_per_cycle)) / (epochs_per_cycle)
	return lrate_max/2 * (cos(cos_inner) + 1)

n_epochs = 100
n_cycles = 5
lr_max = 1e-2
out_ls = []
for i in range(n_epochs):
    out = cosine_annealing(i+1, n_epochs, n_cycles, lr_max)
    out_ls.append(out)

1 2	%matplotlib inline plt.plot(out_ls)

lr consine annealing

Henry's blog

Step by step