Gradient Descent Explained
What gradient descent does
Gradient descent is a method to minimize a cost function.
It works by repeating:
- compute direction of steepest increase (gradient)
- step in the opposite direction (downhill)
false
flowchart TD A[Start with random parameters] --> B[Compute predictions] B --> C[Compute cost] C --> D[Compute gradient] D --> E[Update parameters] E --> B
false
The update rule
For parameter ww:
w := w - α * (∂Cost/∂w)w := w - α * (∂Cost/∂w)
Where:
ααis the learning rate
Learning rate intuition
- too small → slow training
- too large → overshoot / diverge
false
flowchart LR S[Small α] --> Slow[Slow but stable] L[Large α] --> Boom[May diverge]
false
Why it matters for regression
Linear regression can be solved analytically, but gradient descent:
- generalizes to many models (logistic regression, neural nets)
- scales to large datasets
Mini-checkpoint
If training is unstable:
- reduce learning rate
- scale features
- check for exploding gradients (in deep learning)
If this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
