DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60–85% Over MTP-1
DeepSeek released DSpark , a speculative decoding framework, with open-source checkpoints and training code. It is a serving optimization, not a new model. The checkpoints DeepSeek-V4-Pro-DSpark and DeepSeek-V4-Flash-DSpark reuse the existing V4 weights, with a draft module attached. The DeepSeek research team also open-sourced DeepSpec , an MIT-licensed codebase for training and evaluating speculative decoding drafters. The work targets one problem: faster large-model inference in busy production serving. TL;DR DSpark pairs a parallel draft backbone with a tiny sequential head to cut suffix decay. A confidence head and load-aware scheduler verify more tokens when GPUs are idle, fewer when busy. Offline, accepted length rises 26–31% over Eagle3 and 16–18% over DFlash. In production on DeepSeek-V4, per-user generation runs 60–85% faster than the MTP-1 baseline. Output stays lossless, and the checkpoints plus DeepSpec training code are open-source. Wh...
