Backpropagation and Optimizers (Adam, SGD)

The training loop

Neural networks learn by minimizing a loss function.

Loop:

false

  flowchart TD
  X[Batch of data] --> F[Forward pass]
  F --> L[Loss]
  L --> B[Backprop: gradients]
  B --> O[Optimizer update]
  O --> F

Backprop applies the chain rule to compute:

Then adjust weights to reduce the loss.

Update weights using the gradient.

Pros:

Cons:

Adam adapts learning rates per parameter (uses momentum-like estimates).

Pros:

If training loss oscillates wildly:

If this helped you, consider buying me a coffee ☕