Background#
In 2022, I wrote a short article for the Data Talks Club where I explained the essence and benefits of regularization for linear models in machine learning using a fictional yet vivid example. If you’re interested in the topic, you can read the article and even recreate the toy example described in Python to see how it works in practice.
Link#
https://datatalks.club/blog/regularization-in-regression.html
Explain the math#
# define feature matrix X of size 6x3 with nearly same second and third column
X = np.array([[4, 4, 4],
[3, 5, 5],
[5, 1, 1],
[5, 4, 4],
[7, 5, 5],
[4, 5, 5.00000001]])
# define vector y of size 1x6
y= np.array([1, 2, 3, 1, 2, 3])
# calculate Gram matrix for X
XTX = X.T.dot(X)
XTX
array([[140. , 111. , 111.00000004],
[111. , 108. , 108.00000005],
[111.00000004, 108.00000005, 108.0000001 ]])
# take inverse matrix of Gram matrix
XTX_inv = np.linalg.inv(XTX)
XTX_inv
array([[ 3.86409478e-02, -1.26839821e+05, 1.26839770e+05],
[-1.26839767e+05, 2.88638033e+14, -2.88638033e+14],
[ 1.26839727e+05, -2.88638033e+14, 2.88638033e+14]])
# calculate a weights vector w:
w = XTX_inv.dot(X.T).dot(y)
W
array([-1.93908875e-01, -3.61854375e+06, 3.61854643e+06])
# add regularization factor 0.01 to the main diagonal of Gram matrix
XTX = XTX + 0.01 * np.eye(3)
# take inverse matrix of Gram matrix
XTX_inv = np.linalg.inv(XTX)
XTX_inv
array([[ 3.85624712e-02, -1.98159300e-02, -1.98158861e-02],
[-1.98159300e-02, 5.00124975e+01, -4.99875026e+01],
[-1.98158861e-02, -4.99875026e+01, 5.00124974e+01]])
# calculate a weights vector w:
w = XTX_inv.dot(X.T).dot(y)
W
array([0.33643484, 0.04007035, 0.04007161])