AMD's FP8 Problem, and What It Costs

A detailed engineering account of bringing DeepSeek-V4-Flash up on AMD MI300X reveals the real cost of AMD's software ecosystem gaps: FP8 format fragmentation, missing kernels, and HIP graph constraints that each required dedicated engineering effort before getting to 2,700 tokens/s.

Read more →