# Mathematical derivation + pure Python implementation of machine learning algorithm 14: Ridge ridge regression

2022-08-06 12:34:21

In the previous section, we talked about the prevention of overfittingLasso回归模型,也就是基于L1正则化的线性回归.In this lecture, we continue to look at the basisL2正则化的线性回归模型.

L2正则化

相较于L0和L1,其实L2It is the son of heaven in regularization.During various overfitting prevention and regularization processes,L2Regularization is the first candidate.L2The norm refers to the root result of the sum of the squares of the elements in the matrix.采用L2The norm is regularized by minimizing each element of the parameter matrix,make it infinitely close to0但又不像L1that is equal to0,Maybe you will ask again,Why does each element in the parameter matrix become small to prevent overfitting？Here we take the deep neural network as an example.在L2正则化中,How the regularization coefficient becomes relatively large,参数矩阵WEach element in is getting smaller,Linearly computed sumZ也会变小,The activation function is relatively linear at this point,This greatly simplifies the complexity of deep neural networks,Thus overfitting can be prevented.

加入L2The regularized linear regression loss function is shown below.其中第一项为MSE损失,第二项就是L2正则化项. L2正则化相比于L1Regularization is simpler when computing gradients.directly on the loss functionw求导即可.这种基于L2The regularized regression model is known as ridge regression(Ridge Regression). Ridge

With the code framework of the previous lecture,We can directly modify the loss function and gradient calculation formula on the original basis.下面来看具体代码.

``````import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split``````

Read in sample data and divide：

``````data = pd.read_csv('./abalone.csv')
data['Sex'] = data['Sex'].map({'M':0, 'F':1, 'I':2})
X = data.drop(['Rings'], axis=1)
y = data[['Rings']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
X_train, X_test, y_train, y_test = X_train.values, X_test.values, y_train.values, y_test.values
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)`````` ``````# 定义参数初始化函数
def initialize(dims):
w = np.zeros((dims, 1))
b = 0
return w, b``````

``````# 定义ridge损失函数
def l2_loss(X, y, w, b, alpha):
num_train = X.shape
num_feature = X.shape
y_hat = np.dot(X, w) + b
loss = np.sum((y_hat-y)**2)/num_train + alpha*(np.sum(np.square(w)))
dw = np.dot(X.T, (y_hat-y)) /num_train + 2*alpha*w
db = np.sum((y_hat-y)) /num_train
return y_hat, loss, dw, db``````

``````# 定义训练过程
def ridge_train(X, y, learning_rate=0.001, epochs=5000):
loss_list = []
w, b = initialize(X.shape)
for i in range(1, epochs):
y_hat, loss, dw, db = l2_loss(X, y, w, b, 0.1)
w += -learning_rate * dw
b += -learning_rate * db
loss_list.append(loss)

if i % 100 == 0:
print('epoch %d loss %f' % (i, loss))
params = {
'w': w,
'b': b
}
'dw': dw,
'db': db
}

Perform example training：

``````# 执行训练示例
loss, loss_list, params, grads = ridge_train(X_train, y_train, 0.01, 1000)``````  ``````# 定义预测函数
def predict(X, params):
w
=
params
[
'w'
]
b = params['b']

y_pred = np.dot(X, w) + b
return y_pred

y_pred = predict(X_test, params)
y_pred[:5]`````` A graphical representation of test set data and model prediction data：

``````# 简单绘图
import matplotlib.pyplot as plt
f = X_test.dot(params['w']) + params['b']

plt.scatter(range(X_test.shape), y_test)
plt.plot(f, color = 'darkorange')
plt.xlabel('X')
plt.ylabel('y')
plt.show();`````` You can see that the model predictions fit the high and low values ​​poorly,But fits most values.Such a model has relatively strong generalization ability,There is no serious overfitting problem.

Finally, a simple package is carried out：

``````import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

class Ridge():
def __init__(self):
pass

def prepare_data(self):
data['Sex'] = data['Sex'].map({'M': 0, 'F': 1, 'I': 2})
X = data.drop(['Rings'], axis=1)
y = data[['Rings']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
X_train, X_test, y_train, y_test = X_train.values, X_test.values, y_train.values, y_test.values
return X_train, y_train, X_test, y_test

def initialize(self, dims):
w = np.zeros((dims, 1))
b = 0
return w, b

def l2_loss(self, X, y, w, b, alpha):
num_train = X.shape
num_feature = X.shape
y_hat = np.dot(X, w) + b
loss = np.sum((y_hat - y) ** 2) / num_train + alpha * (np.sum(np.square(w)))
dw = np.dot(X.T, (y_hat - y)) / num_train + 2 * alpha * w
db = np.sum((y_hat - y)) / num_train
return y_hat, loss, dw, db

def ridge_train(self, X, y, learning_rate=0.01, epochs=1000):
loss_list = []
w, b = self.initialize(X.shape)
for i in range(1, epochs):
y_hat, loss, dw, db = self.l2_loss(X, y, w, b, 0.1)
w += -learning_rate * dw
b += -learning_rate * db
loss_list.append(loss)

if i % 100 == 0:
print('epoch %d loss %f' % (i, loss))
params = {
'w': w,
'b': b
}
'dw': dw,
'db': db
}

def predict(self, X, params):
w = params['w']
b = params['b']
y_pred = np.dot(X, w) + b
return y_pred

if __name__ == '__main__':
ridge = Ridge()
X_train, y_train, X_test, y_test = ridge.prepare_data()
loss, loss_list, params, grads = ridge.ridge_train(X_train, y_train, 0.01, 1000)
print(params)``````

sklearn中也提供了Ridge的实现方式：

``````# 导入线性模型模块
from sklearn.linear_model import Ridge
# 创建Ridge模型实例
clf = Ridge(alpha=1.0)
# 对训练集进行拟合
clf.fit(X_train, y_train)
# 打印模型相关系数
print("sklearn Ridge intercept :", clf.intercept_)
print("\nsklearn Ridge coefficients :\n", clf.coef_)`````` 以上就是本节内容,In the next section we will extend the tree model,Focus on ensemble learning and GBDT系列.

https://github.com/luwill/machine-learning-code-writing

The code as a whole is rough,还望各位不吝赐教. ``````下载1：OpenCV-Contrib扩展模块中文版教程