Regularization & Ill-Conditioned Problems - Welcome to my blog

Regularization helps with ill-conditioning¶

Although regularization is mainly a tool to prevent overfitting and yield more general models. It also makes problems easier to optimize by improving the condition number $\kappa$ .

\tag{Condition number} \kappa(\boldsymbol{A}) = \left| \frac{\lambda_{\text{max}}}{\lambda_{\text{min}}} \right|

(1)

Let $\boldsymbol{X} \in \mathbb{R}^{m \times d}, \mathbf{y} \in \mathbb{R}^{n}$ and $\mathbf{w} \in \mathbb{R}^{d}$ . The regularized least squares objective is to minimize

\mathbf{w}^\star = \mathop{\mathrm{arg\,min}}_{\mathbf{w}} \frac{1}{2} \left|\left| \boldsymbol{X}\mathbf{w} - \mathbf{y} \right|\right|_2^2 + \frac{\alpha}{2} \left|\left|\mathbf{w}\right|\right|_2^2

(2)

with regularization parameter $\alpha \ge 0$ .

The gradient and the Hessian of this objective can be readily derived as $\boldsymbol{X}^T\boldsymbol{X} + \alpha \boldsymbol{I}$ . Let $\lambda_1 > \lambda_2 > \dots > \lambda_d$ be the eigenvalues of $\boldsymbol{X}^T\boldsymbol{X}$ sorted in descending order. Because $\boldsymbol{X}^T\boldsymbol{X}$ is positive semi-definite, we know that all eigenvalues are positive, and the matrix has eigenvalue decomposition $\boldsymbol{X}^T\boldsymbol{X} = \boldsymbol{Q} \boldsymbol{\lambda} \boldsymbol{Q}^{-1}$ . In case that $\alpha=0$ , the eigenvalues of the Hessian are simply the eigenvalues of $\boldsymbol{X}^T\boldsymbol{X}$ and the condition number is given by $\frac{\lambda_1}{\lambda_n}$ . For the case that $\alpha > 0$ , we can decompose the Hessian as follows

\begin{aligned} \boldsymbol{X}^T\boldsymbol{X} + \alpha \boldsymbol{I} &= \boldsymbol{Q} \boldsymbol{\lambda} \boldsymbol{Q}^{-1} + \alpha \boldsymbol{Q}\boldsymbol{Q}^{-1}\\ &=\boldsymbol{Q} \left(\boldsymbol{\lambda} + \alpha \boldsymbol{I} \right) \boldsymbol{Q}^{-1} \end{aligned}

(3)

Such that the eigenvalues of the regularized objectives Hessian are given by $\tilde{\lambda}_i = \lambda_i + \alpha$ . Therefore the condition number of the regularized objective is given by $\frac{\lambda_1 + \alpha}{\lambda_n + \alpha}$ .
To conclude that regularization improves the condition number we show that $\frac{\lambda_1}{\lambda_n} > \frac{\lambda_1 + \alpha}{\lambda_n + \alpha}$

Posts

Backpropagation

Posts

Notes on attacks against machine learning models