Exact vs approximate · stability of the guidance gradient

X axis = noisy point a_t, Y axis = the guidance gradient computed there. Watch which curve is smooth and which jitters.

QGF (approximate: â₁ + identity Jacobian) BPTT (exact: run the whole chain) OOD (at a_t)
0.25
0.40
QGF jitter |Δg|
BPTT jitter |Δg|
OOD jitter |Δg|