The dominant frame for AI progress in 2026 is still scaling — more parameters, more compute, longer context. A piece in The Register last week described a system that runs at the opposite extreme, and it’s worth understanding how far the other end of the spectrum actually goes.
CERN’s Large Hadron Collider produces roughly one billion proton-proton collisions per second. The physics of interest — Higgs decays, rare flavor violations, possible new particles — is buried in that signal at a rate of maybe one interesting event per billion collisions. The problem is that storing all that data is physically impossible: each collision event is kilobytes of data, and at 40 MHz you’d need storage capacity that doesn’t exist. So the data must be filtered in real time, irreversibly, before it’s written anywhere.
That filtering happens in a system called the Level-1 Trigger. It evaluates each collision in under 50 nanoseconds and decides whether the event is worth keeping. That’s faster than light can travel 15 meters. The trigger must make this decision using only the data the detectors have already produced — no lookups, no network calls, no waiting for context. Approximately 1,000 FPGAs implement the trigger for CMS, one of the two main LHC experiments.
The tool that makes neural networks viable at this timescale is HLS4ML — High Level Synthesis for Machine Learning. It’s an open-source transpiler that takes a model defined in PyTorch or Keras and converts it to synthesizable C++ that the FPGA design tools can compile down to gates. The key insight is that all weights stay on-chip. Off-chip memory access has microsecond latency; that’s thousands of times too slow. HLS4ML produces circuits where the entire computation happens in local registers and look-up tables, with no memory bus in the critical path.
For the simplest models, CERN goes further: precomputing every possible input-output pair and storing the result as a lookup table on-chip. The model doesn’t run an inference pass at all — it does a table lookup. This only works if the input space is small enough, but for the structured detector outputs that feed the trigger, it often is. CERN trains their models to be “small from the get-go,” with quantization and pruning applied from the start rather than as a post-training afterthought.
One of their deployed algorithms is called AXOL1TL — an anomaly detector that flags unusual topologies for preservation without knowing ahead of time what the anomaly looks like. It rejects over 99.7% of inputs. Combined with the full trigger chain, the system reduces 40 million events per second to about 100,000. The discarded 99.75% is gone forever.
A few things stand out here beyond the raw numbers.
CERN’s team has found that tree-based models outperform deep learning for their specific problem. The detector data is structured and tabular; there’s no spatial hierarchy or long-range dependency that rewards the inductive biases of neural networks. Sometimes the simpler model wins, and “simpler” here means simple enough to burn into combinational logic on silicon.
The other thing is what this represents as a point on the inference-efficiency curve. The AI industry has been rightly focused on making large models faster and cheaper — quantization, speculative decoding, MoE architectures, distillation. All of that is real and important. But the FPGA approach occupies a separate regime entirely: latencies so short that the model must fit in silicon combinational logic, with no memory access, no batch dimension, no OS scheduler anywhere in the path. HLS4ML is the compiler that bridges that gap from the researcher’s Jupyter notebook to the FPGA bitstream.
The code is on GitHub and has been active since 2018. It’s one of those projects that’s been solving a real problem quietly for years while the rest of the field chases the next order of magnitude in parameters.
