$\min_{x\in C}f(x)$
$f$: convex differentiable $C$
objective function is an “expectation”
$f(x)=\frac{1}{m}\left[f_1(x)+f_2(x)+\cdots+f_m(x)\right]=\mathbb{E}_i f_i(x)$,
where i is random variable uniform distributed over $\left\{1,\cdots,m\right\}$
Example: least-squares
$\begin{aligned}f(x)&=\frac{1}{m}\left[\frac{1}{2}\lVert Ax-b\rVert_2^2\right]\\&=\frac{1}{m}\left[\frac{1}{2}\lVert \begin{bmatrix}a_1^T\\ \vdots\\a_m^T\end{bmatrix}x-\begin{bmatrix}b_1\\ \vdots\\b_m\end{bmatrix}\rVert^2\right]\\&=\frac{1}{m}\sum_{i=1}^m \frac{1}{2}(a_i^Tx-b_i)^2\\&=\frac{1}{m}\sum_{i=1}^mf_i(x)\end{aligned}$
$x_{k+1}=x_k-\alpha_k\nabla f(x_k)$
main cost at each iteration $k$ is $\nabla f(x_k)=\frac{1}{m}\sum_{i=1}^m\nabla f_i(x)$
Ex: for LS,
$\begin{aligned}\nabla f(x)&=\nabla \left[ \frac{1}{2}\lVert Ax-b\rVert^2\right] \\ &=A^T(Ax-b)\\ &=A^Tr\\ &=\begin{bmatrix}a_1 \cdots a_m\end{bmatrix}\begin{bmatrix}r_1\\\vdots\\r_m\end{bmatrix}=\sum_{i=1}^m a_ir_i\end{aligned}$
randomly sample
$i\in\left\{1,\cdots,m\right\}$
Stochastic gradient approx to $\nabla f(x)$ is