# [Self-study] Introduction to deep learning Python-based theory and implementation LESSON10

2022-09-09 01:24:58

1. SGD的缺点

2. Momentum

5. 基于MNISTData set to update method is

1. The initial weights value cannot be set as0

# 前言

This section describes the weight parameter optimization method,The search for the optimal weighting parameters optimization method.

# 一、参数的更新

## 1. SGD的缺点

If the function of the shape to fly army,比如呈延伸状,搜索的路径会非常低效.Investigate the root reason is the direction of the gradient and there is no point to the direction of the minimum. ## 2. Momentum W表示权重参数,yita表示学习率,vSaid on the gradient of stress.

av这一项表示在物体不受任何力时,Receive the resistance of the.

``````class Momentum:

"""Momentum SGD"""

def __init__(self, lr=0.01, momentum=0.9):
self.lr = lr
self.momentum = momentum
self.v = None

def update(self, params, grads):
if self.v is None:
self.v = {}
for key, val in params.items():
self.v[key] = np.zeros_like(val)

for key in params.keys():
self.v[key] = self.momentum*self.v[key] - self.lr*grads[key]
params[key] += self.v[key]``````

分析：

（1）初始化时,v中什么都不保存,当第一次调用update（）时,vWill be saved in the form of dictionary variable and parameter structure of the same data.

（2）np.zeros_like(val):

``````import numpy as np

a = np.arange(12)
a = a.reshape(2,2,3)
b = np.zeros_like(a)
print(a)
print(b)``````

``````[[[ 0  1  2]
[ 3  4  5]]

[[ 6  7  8]
[ 9 10 11]]]

[[[0 0 0]
[0 0 0]]

[[0 0 0]
[0 0 0]]]``````

（3） What is the significance of this part of the code？params.items()是什么？此处存疑

...........................................

Momentum的优点 ： AdaGradCan adjust the learning rate for the parameter of each element,与此同时进行学习.即随着学习的进行,使学习率逐渐减小. Said the corresponding matrix element to product.

Parameter changes from elements of the earth element vector to smaller.

``````class AdaGrad:

def __init__(self, lr=0.01):
self.lr = lr
self.h = None

def update(self, params, grads):
if self.h is None:
self.h = {}
for key, val in params.items():
self.h[key] = np.zeros_like(val)

for key in params.keys():
params[key] -= self.lr * grads[key] / (np.sqrt(self.h[key]) + 1e-7)``````

Weights update path as shown in the figure below： Function efficiently to the minimum value of mobile.

``````class Adam:

def __init__(self, lr=0.001, beta1=0.9, beta2=0.999):
self.lr = lr
self.beta1 = beta1
self.beta2 = beta2
self.iter = 0
self.m = None
self.v = None

def update(self, params, grads):
if self.m is None:
self.m, self.v = {}, {}
for key, val in params.items():
self.m[key] = np.zeros_like(val)
self.v[key] = np.zeros_like(val)

self.iter += 1
lr_t  = self.lr * np.sqrt(1.0 - self.beta2**self.iter) / (1.0 - self.beta1**self.iter)

for key in params.keys():
#self.m[key] = self.beta1*self.m[key] + (1-self.beta1)*grads[key]
#self.v[key] = self.beta2*self.v[key] + (1-self.beta2)*(grads[key]**2)
self.m[key] += (1 - self.beta1) * (grads[key] - self.m[key])
self.v[key] += (1 - self.beta2) * (grads[key]**2 - self.v[key])

params[key] -= lr_t * self.m[key] / (np.sqrt(self.v[key]) + 1e-7)

#unbias_m += (1 - self.beta1) * (grads[key] - self.m[key]) # correct bias
#unbisa_b += (1 - self.beta2) * (grads[key]*grads[key] - self.v[key]) # correct bias
#params[key] += self.lr * unbias_m / (np.sqrt(unbisa_b) + 1e-7)``````

AdamWill set three super parameters：学习率、beta1、beta2.根据论文,beta1 = 0.9, beta2 = 0.999.In most case, it can go smoothly.

## 5. 基于MNISTData set to update method is

``````# coding: utf-8
import os
import sys
sys.path.append(os.pardir)  # 为了导入父目录的文件而进行的设定
import matplotlib.pyplot as plt
from dataset.mnist import load_mnist
from common.util import smooth_curve
from common.multi_layer_net import MultiLayerNet
from common.optimizer import *

# 0:读入MNIST数据==========
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True)

train_size = x_train.shape #把x_trainInto a one-dimensional
batch_size = 128
max_iterations = 2000

# 1:进行实验的设置==========
optimizers = {}
optimizers['SGD'] = SGD()
optimizers['Momentum'] = Momentum()
#optimizers['RMSprop'] = RMSprop()

networks = {}
train_loss = {}
for key in optimizers.keys():
networks[key] = MultiLayerNet(
input_size=784, hidden_size_list=[100, 100, 100, 100],
output_size=10)
train_loss[key] = []

# 2:开始训练==========
for i in range(max_iterations):
batch_mask = np.random.choice(train_size, batch_size)

for key in optimizers.keys():

loss = networks[key].loss(x_batch, t_batch)
train_loss[key].append(loss)

if i % 100 == 0: #i与100Take the under-pressure sieve analyzer is equal to0
print( "===========" + "iteration:" + str(i) + "===========")
for key in optimizers.keys():
loss = networks[key].loss(x_batch, t_batch)
print(key + ":" + str(loss))

# 3.绘制图形==========
markers = {"SGD": "o", "Momentum": "x", "AdaGrad": "s", "Adam": "D"}
x = np.arange(max_iterations)
for key in optimizers.keys():
plt.plot(x, smooth_curve(train_loss[key]), marker=markers[key], markevery=100, label=key)
plt.xlabel("iterations")
plt.ylabel("loss")
plt.ylim(0, 1)
plt.legend()
plt.show()

`````` （1） np.random.choice():

``````import numpy as np

a = np.random.choice(10, 8)
#从[0, 10)内输出8A one-dimensional array number is formed anda
print(a)

b = np.random.choice(a, 5)
#从一维数组aTo extract5Number of one-dimensional arrayb
#注意：a必须是一维的
print(b)``````

``````[1 3 7 5 7 5 1 7]
[5 7 7 5 1]``````

# 二、权重的初始值

What kind of initial value setting is related to neural network learning success.

## 1. The initial weights value cannot be set as0

The initial weight value must not be set as0

``````import numpy as np

a = 0.01 * np.random.randn(10, 100)``````

``````import numpy as np

a = 0.01 * np.random.randn(2, 4, 3)#2*3*4的数组,It indicates that the generated array dimension
b = np.random.randn(2, 4)
print(f'a is {a}')
print(f'b is {b}')``````

结果：

``````a is [[[-0.01141521  0.00021992 -0.00668211]
[-0.00799102 -0.01430591  0.00065054]
[ 0.00253524 -0.01118892 -0.01097236]
[-0.00580513  0.00963655 -0.00336067]]

[[ 0.00232957 -0.00983508  0.00066577]
[-0.01303359  0.02022611 -0.00138892]
[-0.00026297 -0.00356707 -0.01244644]
[ 0.00965091  0.00946335  0.00834518]]]
b is [[ 1.3743193  -1.40996427  0.11132154 -0.37661421]
[ 0.61963745 -0.37448273 -0.69203084 -1.4140828 ]]``````

# 总结

Don't particularly understand about parameter updating methods personally,But in the specific application do not seem to need to understand the mechanism of.Therefore no longer stay here too much.The next section focuses on the initial value of weights.