Gradient Descent Explained

What gradient descent does

Gradient descent is a method to minimize a cost function.

It works by repeating:

compute direction of steepest increase (gradient)
step in the opposite direction (downhill)

false

  flowchart TD
  A[Start with random parameters] --> B[Compute predictions]
  B --> C[Compute cost]
  C --> D[Compute gradient]
  D --> E[Update parameters]
  E --> B

false

The update rule

For parameter ww:

w := w - α * (∂Cost/∂w)w := w - α * (∂Cost/∂w)

Where:

αα is the learning rate

Learning rate intuition

too small → slow training
too large → overshoot / diverge

false

  flowchart LR
  S[Small α] --> Slow[Slow but stable]
  L[Large α] --> Boom[May diverge]

false

Why it matters for regression

Linear regression can be solved analytically, but gradient descent:

generalizes to many models (logistic regression, neural nets)
scales to large datasets

Mini-checkpoint

If training is unstable:

reduce learning rate
scale features
check for exploding gradients (in deep learning)

If this helped you, consider buying me a coffee ☕

Buy me a coffee

Gradient Descent Explained

What gradient descent does

false

The update rule

Learning rate intuition

false

Why it matters for regression

Mini-checkpoint

Was this page helpful?