current position:Home>Mathematical derivation + pure Python implementation of machine learning algorithm 14: Ridge ridge regression

Mathematical derivation + pure Python implementation of machine learning algorithm 14: Ridge ridge regression

2022-08-06 12:34:21Xiaobai learns vision

点击上方“小白学视觉”,选择加"星标"或“置顶

重磅干货,第一时间送达

     In the previous section, we talked about the prevention of overfittingLasso回归模型,也就是基于L1正则化的线性回归.In this lecture, we continue to look at the basisL2正则化的线性回归模型.

L2正则化

     相较于L0和L1,其实L2It is the son of heaven in regularization.During various overfitting prevention and regularization processes,L2Regularization is the first candidate.L2The norm refers to the root result of the sum of the squares of the elements in the matrix.采用L2The norm is regularized by minimizing each element of the parameter matrix,make it infinitely close to0但又不像L1that is equal to0,Maybe you will ask again,Why does each element in the parameter matrix become small to prevent overfitting?Here we take the deep neural network as an example.在L2正则化中,How the regularization coefficient becomes relatively large,参数矩阵WEach element in is getting smaller,Linearly computed sumZ也会变小,The activation function is relatively linear at this point,This greatly simplifies the complexity of deep neural networks,Thus overfitting can be prevented.

     加入L2The regularized linear regression loss function is shown below.其中第一项为MSE损失,第二项就是L2正则化项.

d08b7e73442e6010485a3fa4396d1ac1.png

      L2正则化相比于L1Regularization is simpler when computing gradients.directly on the loss functionw求导即可.这种基于L2The regularized regression model is known as ridge regression(Ridge Regression).

015709c5aeac26511e3fa391f10c89d6.jpeg

Ridge

     With the code framework of the previous lecture,We can directly modify the loss function and gradient calculation formula on the original basis.下面来看具体代码.

导入相关模块:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

Read in sample data and divide:

data = pd.read_csv('./abalone.csv')
data['Sex'] = data['Sex'].map({'M':0, 'F':1, 'I':2})
X = data.drop(['Rings'], axis=1)
y = data[['Rings']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
X_train, X_test, y_train, y_test = X_train.values, X_test.values, y_train.values, y_test.values
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

fd233d4508c4001ea356da2e447a46d4.jpeg

模型参数初始化:

# 定义参数初始化函数
def initialize(dims):
    w = np.zeros((dims, 1))
    b = 0
    return w, b

定义L2Loss function and gradient computation:

# 定义ridge损失函数
def l2_loss(X, y, w, b, alpha):
    num_train = X.shape[0]
    num_feature = X.shape[1]
    y_hat = np.dot(X, w) + b
    loss = np.sum((y_hat-y)**2)/num_train + alpha*(np.sum(np.square(w)))
    dw = np.dot(X.T, (y_hat-y)) /num_train + 2*alpha*w
    db = np.sum((y_hat-y)) /num_train
    return y_hat, loss, dw, db

定义Ridge训练过程:

# 定义训练过程
def ridge_train(X, y, learning_rate=0.001, epochs=5000):
    loss_list = []
    w, b = initialize(X.shape[1])
    for i in range(1, epochs):
        y_hat, loss, dw, db = l2_loss(X, y, w, b, 0.1)
        w += -learning_rate * dw
        b += -learning_rate * db
        loss_list.append(loss)
        
        if i % 100 == 0:
            print('epoch %d loss %f' % (i, loss))
        params = {
            'w': w,
            'b': b
        }
        grads = {
            'dw': dw,
            'db': db
        }
    return loss, loss_list, params, grads

Perform example training:

# 执行训练示例
loss, loss_list, params, grads = ridge_train(X_train, y_train, 0.01, 1000)

a412c7ea325bc11ddca47decbd60a744.png

模型参数:

38c377017a7d2b9f8631367ad517c572.png

定义模型预测函数:

# 定义预测函数
def predict(X, params):
    w
 = 
params
[
'w'
]
    b = params['b']
    
    y_pred = np.dot(X, w) + b
    return y_pred


y_pred = predict(X_test, params)
y_pred[:5]

2ce8fa6eaf19dcc46b39eb7b0c6dcc65.png

A graphical representation of test set data and model prediction data:

# 简单绘图
import matplotlib.pyplot as plt
f = X_test.dot(params['w']) + params['b']


plt.scatter(range(X_test.shape[0]), y_test)
plt.plot(f, color = 'darkorange')
plt.xlabel('X')
plt.ylabel('y')
plt.show();

eedd4442e04461a5c27d8197fd6119e3.png

     You can see that the model predictions fit the high and low values ​​poorly,But fits most values.Such a model has relatively strong generalization ability,There is no serious overfitting problem.

Finally, a simple package is carried out:

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split


class Ridge():
    def __init__(self):
        pass
        
    def prepare_data(self):
        data = pd.read_csv('./abalone.csv')
        data['Sex'] = data['Sex'].map({'M': 0, 'F': 1, 'I': 2})
        X = data.drop(['Rings'], axis=1)
        y = data[['Rings']]
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
        X_train, X_test, y_train, y_test = X_train.values, X_test.values, y_train.values, y_test.values
        return X_train, y_train, X_test, y_test
    
    def initialize(self, dims):
        w = np.zeros((dims, 1))
        b = 0
        return w, b


    def l2_loss(self, X, y, w, b, alpha):
        num_train = X.shape[0]
        num_feature = X.shape[1]
        y_hat = np.dot(X, w) + b
        loss = np.sum((y_hat - y) ** 2) / num_train + alpha * (np.sum(np.square(w)))
        dw = np.dot(X.T, (y_hat - y)) / num_train + 2 * alpha * w
        db = np.sum((y_hat - y)) / num_train
        return y_hat, loss, dw, db


    def ridge_train(self, X, y, learning_rate=0.01, epochs=1000):
        loss_list = []
        w, b = self.initialize(X.shape[1])
        for i in range(1, epochs):
            y_hat, loss, dw, db = self.l2_loss(X, y, w, b, 0.1)
            w += -learning_rate * dw
            b += -learning_rate * db
            loss_list.append(loss)
        
            if i % 100 == 0:
                print('epoch %d loss %f' % (i, loss))
            params = {
                'w': w,
                'b': b
            }
            grads = {
                'dw': dw,
                'db': db
            }
        return loss, loss_list, params, grads
    
    
    def predict(self, X, params):
        w = params['w']
        b = params['b']
        y_pred = np.dot(X, w) + b
        return y_pred
    
 
if __name__ == '__main__':
    ridge = Ridge()
    X_train, y_train, X_test, y_test = ridge.prepare_data()
    loss, loss_list, params, grads = ridge.ridge_train(X_train, y_train, 0.01, 1000)
    print(params)

sklearn中也提供了Ridge的实现方式:

# 导入线性模型模块
from sklearn.linear_model import Ridge
# 创建Ridge模型实例
clf = Ridge(alpha=1.0)
# 对训练集进行拟合
clf.fit(X_train, y_train)
# 打印模型相关系数
print("sklearn Ridge intercept :", clf.intercept_)
print("\nsklearn Ridge coefficients :\n", clf.coef_)

07f21d39d31406c6bbdafdae436fff3a.jpeg

     以上就是本节内容,In the next section we will extend the tree model,Focus on ensemble learning and GBDT系列.

For more information, please refer to the authorGitHub地址:

https://github.com/luwill/machine-learning-code-writing

The code as a whole is rough,还望各位不吝赐教.

好消息!

小白学视觉知识星球

开始面向外开放啦

1b97d9f5b6fe82e4e741c3a096ba9344.jpeg

下载1:OpenCV-Contrib扩展模块中文版教程

在「小白学视觉」公众号后台回复:扩展模块中文教程,即可下载全网第一份OpenCV扩展模块教程中文版,涵盖扩展模块安装、SFM算法、立体视觉、目标跟踪、生物视觉、超分辨率处理等二十多章内容.


下载2:Python视觉实战项目52讲
在「小白学视觉」公众号后台回复:Python视觉实战项目,即可下载包括图像分割、口罩检测、车道线检测、车辆计数、添加眼线、车牌识别、字符识别、情绪检测、文本内容提取、面部识别等31个视觉实战项目,助力快速学校计算机视觉.


下载3:OpenCV实战项目20讲
在「小白学视觉」公众号后台回复:OpenCV实战项目20讲,即可下载含有20个基于OpenCV实现20个实战项目,实现OpenCV学习进阶.


交流群

欢迎加入公众号读者群一起和同行交流,目前有SLAM、三维视觉、传感器、自动驾驶、计算摄影、检测、分割、识别、医学影像、GAN、算法竞赛等微信群(以后会逐渐细分),请扫描下面微信号加群,备注:”昵称+学校/公司+研究方向“,例如:”张三 + 上海交大 + 视觉SLAM“.请按照格式备注,否则不予通过.添加成功后会根据研究方向邀请进入相关微信群.请勿在群内发送广告,否则会请出群,谢谢理解~

copyright notice
author[Xiaobai learns vision],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/218/202208061227396340.html

Random recommended