Sander Dieleman posted a long, mathematically careful piece on flow maps yesterday that’s worth reading for anyone who has been watching the diffusion model acceleration literature pile up and wondering how the pieces fit together.
The standard story of a diffusion model is that it learns a velocity field: a function that, for any point at any noise level , tells you which direction to step to move from noise toward data. Sampling means following that velocity field — numerically integrating a differential equation — which typically requires dozens of evaluations. The acceleration literature is full of papers with names like consistency models, flow matching distillation, progressive distillation, and so on, each claiming to reduce the step count while preserving quality. They’re related but the relationships aren’t always explicit.
Dieleman’s framing starts one level up. A flow map is a function that directly predicts any point on the diffusion path: given a starting point at noise level , it tells you where you’d be at noise level after following the velocity field exactly. Formally:
The denoiser — what standard diffusion models learn — is just the special case of : go from noise level all the way to data (). A flow map is the generalization: jump between any two points on the path, not just from current noise level to data.
This generalization enables single-step sampling: set to the pure noise level and to data level, evaluate once. The reason that doesn’t just give you a noisy shortcut is the consistency constraint: a valid flow map must satisfy
for all triples . That is, taking two jumps in sequence must equal taking the direct jump — the map is self-consistent. This is called the compositional consistency property. Dieleman also presents a Lagrangian version (compute the flow map by composing many small steps) and an Eulerian version (the flow map satisfies a partial differential equation relating its derivatives), showing that all three are equivalent characterizations of the same underlying object.
The unified framework turns out to explain a lot of recent papers. Consistency models enforce exactly this compositional constraint during distillation. Shortcut models learn parametric shortcuts between arbitrary noise levels. Various progressive distillation schemes approximate the integral by chaining smaller learned jumps. Once you see the flow map as the thing being learned, the differences between these approaches are mostly about which pairs you sample during training and what consistency formulation you use as a loss.
What’s genuinely new in Dieleman’s synthesis isn’t a single novel method — it’s the catalog. He walks through 20+ training variants as instances of the general framework, which makes it possible to compare them structurally rather than just empirically. Some are training existing models to produce flow maps directly (distillation). Some are training from scratch with flow map objectives. Some use the Lagrangian path (compose small steps) as a consistency target; others use the Eulerian path (match derivatives). The taxonomy makes it clearer which design choices are independent and which are coupled.
There are practical advantages to flow maps over standard denoisers that go beyond step count. Because a flow map is bijective between any two noise levels — it’s integrating a flow that preserves volume — it naturally supports running the diffusion in reverse (inverting a sample back to its noise representation), which is useful for editing, reward learning, and guidance. Standard denoisers don’t have this property directly. Flow maps also compose cleanly: you can chain maps trained at different resolutions or with different conditioning signals, which opens up structured generation pipelines.
The post doesn’t introduce a new model. It’s a synthesis, and explicitly positions itself as one. But in a subfield where the literature has grown faster than the theoretical framework, a careful synthesis that reveals the underlying structure can matter as much as a new method. The flow map formulation gives practitioners a way to think about what they’re actually training when they distill a diffusion model, and what properties the result should have.
