$x_0$ given; $\varepsilon > 0$ convergence tolerance
for $k = 0,1,2,…$
$(P)$$\min_{x \in \mathbb{R}^n} f(x)$ $f:\mathbb{R}^n \to \mathbb{R}$ continuous differentiable
Make a change of variables:
$x:=Sy$ or $y=S^{-1}x$
where $S$ is non-singular and squares
Ex:
$\begin{bmatrix} s_1 & & & &\\ &s_2&&&\\ & &s_3&& \\ & & & \ddots & \\ &&&&s_n \end{bmatrix}$
$Sw=0$ iff $w=0$
$S^Tv = 0$ iff $v=0$
$x=\begin{bmatrix}s_1 & & \\ & \ddots & \\&&s_n\end{bmatrix}\begin{bmatrix}y_1 \\ \vdots \\ y_n\end{bmatrix}$⇒ $x_i=s_iy_i$
$(P_{scaled})$ $\min_{y\in \mathbb{R}^n} g(y):=f(Sy)$
Apply gradient descent to $(P_{scaled})$:
$y_{k+1}=y_k-\alpha_k \nabla g(y_k)=y_k-\alpha_kS^T\nabla f(Sy_k)$
$\nabla g(y) = \nabla_y f(Sy) = S^T\nabla f(Sy)$
application of chain rule and $\nabla_x (a^Tx)=a$