The Cliff in Lambda Calculus
Victor Taelin published LamBench, 120 pure lambda calculus programming problems in a minimal custom language. The results show a hard generational cliff: GPT-5.1, Opus 4.5, and Sonnet 4.5 score exactly 0 out of 120, while the top tier — GPT-5.3 Codex and Opus 4.6 — lands at 90%. The benchmark tests something standard evaluations mostly avoid: symbolic computation that can't be approximated by pattern matching.
Read more →
