Skip to article frontmatterSkip to article content
machine learning

Regularization & Ill-Conditioned Problems

Regularization helps with ill-conditioning

Although regularization is mainly a tool to prevent overfitting and yield more general models. It also makes problems easier to optimize by improving the condition number κ\kappa.

κ(A)=λmaxλmin(Condition number)\tag{Condition number} \kappa(\boldsymbol{A}) = \left| \frac{\lambda_{\text{max}}}{\lambda_{\text{min}}} \right|

Let XRm×d,yRn\boldsymbol{X} \in \mathbb{R}^{m \times d}, \mathbf{y} \in \mathbb{R}^{n} and wRd\mathbf{w} \in \mathbb{R}^{d}. The regularized least squares objective is to minimize

w=argminw12Xwy22+α2w22\mathbf{w}^\star = \mathop{\mathrm{arg\,min}}_{\mathbf{w}} \frac{1}{2} \left|\left| \boldsymbol{X}\mathbf{w} - \mathbf{y} \right|\right|_2^2 + \frac{\alpha}{2} \left|\left|\mathbf{w}\right|\right|_2^2

with regularization parameter α0\alpha \ge 0.

The gradient and the Hessian of this objective can be readily derived as XTX+αI\boldsymbol{X}^T\boldsymbol{X} + \alpha \boldsymbol{I}. Let λ1>λ2>>λd\lambda_1 > \lambda_2 > \dots > \lambda_d be the eigenvalues of XTX\boldsymbol{X}^T\boldsymbol{X} sorted in descending order. Because XTX\boldsymbol{X}^T\boldsymbol{X} is positive semi-definite, we know that all eigenvalues are positive, and the matrix has eigenvalue decomposition XTX=QλQ1\boldsymbol{X}^T\boldsymbol{X} = \boldsymbol{Q} \boldsymbol{\lambda} \boldsymbol{Q}^{-1}. In case that α=0\alpha=0, the eigenvalues of the Hessian are simply the eigenvalues of XTX\boldsymbol{X}^T\boldsymbol{X} and the condition number is given by λ1λn\frac{\lambda_1}{\lambda_n}. For the case that α>0\alpha > 0, we can decompose the Hessian as follows

XTX+αI=QλQ1+αQQ1=Q(λ+αI)Q1\begin{aligned} \boldsymbol{X}^T\boldsymbol{X} + \alpha \boldsymbol{I} &= \boldsymbol{Q} \boldsymbol{\lambda} \boldsymbol{Q}^{-1} + \alpha \boldsymbol{Q}\boldsymbol{Q}^{-1}\\ &=\boldsymbol{Q} \left(\boldsymbol{\lambda} + \alpha \boldsymbol{I} \right) \boldsymbol{Q}^{-1} \end{aligned}

Such that the eigenvalues of the regularized objectives Hessian are given by λ~i=λi+α\tilde{\lambda}_i = \lambda_i + \alpha. Therefore the condition number of the regularized objective is given by λ1+αλn+α\frac{\lambda_1 + \alpha}{\lambda_n + \alpha}.
To conclude that regularization improves the condition number we show that λ1λn>λ1+αλn+α\frac{\lambda_1}{\lambda_n} > \frac{\lambda_1 + \alpha}{\lambda_n + \alpha}