Backpropagation is the algorithm that allows neural networks to learn from data. At its core, it is simply the chain rule of calculus applied repeatedly.

The intuition first

Imagine adjusting the tuning knobs on a radio to get the clearest signal. Each small turn gives you feedback โ€” better or worse. Backpropagation is the mathematical equivalent: given the error a network made, it tells every weight exactly how much to shift to reduce that error.

The chain rule

If the output depends on layer 3, which depends on layer 2, which depends on layer 1, then the rate of change of the output with respect to layer 1 is the product of all those individual rates. Backpropagation applies this systematically across every layer.

What the gradient tells you

For each weight, the gradient tells you the direction to move (increase or decrease) and how sensitive the error is to that weight. A large gradient means this weight matters a lot right now.