Skip to content

Gradient Descent Explained

What gradient descent does

Gradient descent is a method to minimize a cost function.

It works by repeating:

  1. compute direction of steepest increase (gradient)
  2. step in the opposite direction (downhill)

false


  flowchart TD
  A[Start with random parameters] --> B[Compute predictions]
  B --> C[Compute cost]
  C --> D[Compute gradient]
  D --> E[Update parameters]
  E --> B

false

The update rule

For parameter ww:

w := w - α * (∂Cost/∂w)w := w - α * (∂Cost/∂w)

Where:

  • αα is the learning rate

Learning rate intuition

  • too small → slow training
  • too large → overshoot / diverge

false


  flowchart LR
  S[Small α] --> Slow[Slow but stable]
  L[Large α] --> Boom[May diverge]

false

Why it matters for regression

Linear regression can be solved analytically, but gradient descent:

  • generalizes to many models (logistic regression, neural nets)
  • scales to large datasets

Mini-checkpoint

If training is unstable:

  • reduce learning rate
  • scale features
  • check for exploding gradients (in deep learning)

If this helped you, consider buying me a coffee ☕

Buy me a coffee

Was this page helpful?

Let us know how we did