Backpropagation and Optimizers (Adam, SGD)
The training loop
Neural networks learn by minimizing a loss function.
Loop:
- forward pass → compute predictions
- compute loss
- backpropagation → compute gradients
- optimizer step → update weights
false
flowchart TD X[Batch of data] --> F[Forward pass] F --> L[Loss] L --> B[Backprop: gradients] B --> O[Optimizer update] O --> F
false
Backpropagation (intuition)
Backprop applies the chain rule to compute:
- how much each weight contributed to the loss
Then adjust weights to reduce the loss.
SGD (Stochastic Gradient Descent)
Update weights using the gradient.
Pros:
- simple and reliable
Cons:
- can be slow
- sensitive to learning rate
Adam
Adam adapts learning rates per parameter (uses momentum-like estimates).
Pros:
- usually faster convergence
- strong default choice
Mini-checkpoint
If training loss oscillates wildly:
- try lowering learning rate.
If this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
