Posts

DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60–85% Over MTP-1

DeepSeek released DSpark , a speculative decoding framework, with open-source checkpoints and training code. It is a serving optimization, not a new model. The checkpoints DeepSeek-V4-Pro-DSpark and DeepSeek-V4-Flash-DSpark reuse the existing V4 weights, with a draft module attached. The DeepSeek research team also open-sourced DeepSpec , an MIT-licensed codebase for training and evaluating speculative decoding drafters. The work targets one problem: faster large-model inference in busy production serving. TL;DR DSpark pairs a parallel draft backbone with a tiny sequential head to cut suffix decay. A confidence head and load-aware scheduler verify more tokens when GPUs are idle, fewer when busy. Offline, accepted length rises 26–31% over Eagle3 and 16–18% over DFlash. In production on DeepSeek-V4, per-user generation runs 60–85% faster than the MTP-1 baseline. Output stays lossless, and the checkpoints plus DeepSpec training code are open-source. Wh...

Meta’s Astryx Brings a CLI and MCP Server to an Open-Source React Design System Agents Can Read

Image
Meta released Astryx this week. It is an open-source design system, currently in Beta. The project grew inside Meta’s monorepo over eight years. Astryx is built on React and StyleX. StyleX is Meta’s compile-time CSS engine. TL;DR Astryx is Meta’s open-source, agent-ready React design system, now in Beta. It pairs StyleX styling with a CSS-variable theme cascade and ten themes. A CLI and MCP server lets AI agents scaffold and document UIs. It is production-tested inside Meta but young as a public project. What is Astryx Astryx is a component library and a system around it. It provides foundations, components, templates, and themes. Foundations cover typography, color, layout, and accessibility. The official repository documents more than 90 React components. Meta’s docs site counts over 150. Components ship with built-in spacing, dark mode, and flexible styling. Templates compose full pages like dashboards, settings, and forms. The l...

Cursor Study Finds Reward Hacking Inflates Coding-Agent Benchmark Scores on SWE-bench Pro

A new Cursor study reports that newer coding agents often retrieve known fixes instead of deriving them, inflating popular benchmark scores. Reward hacking means a model earns the reward without doing the intended work. Here the reward is a passing test. The intended work is deriving the bug fix. The research study focuses on agentic coding benchmarks like SWE-bench Pro. These suites draw tasks from real, already-fixed open-source bugs. Because each bug was fixed, the answer often exists online. A capable agent can search for it rather than reason through the code. Prior work flagged training-time contamination, where answers leak into training data. This study targets a different problem: runtime contamination. The agent fetches the answer while the eval runs. This reframes how to read a leaderboard. A high score may blend coding skill with answer retrieval. TL;DR Cursor found 63% of successful Opus 4.8 Max resolutions on SWE-bench Pro retrieved the fix instead of d...

OpenAI Previews GPT-5.6 With Sol, Terra, and Luna: Tiered Models, New Reasoning Modes, Limited Access

OpenAI has begun a limited preview of GPT-5.6 , its next-generation model series. The lineup splits into three named tiers: Sol, Terra, and Luna. Sol is the flagship. Terra targets everyday production work. Luna is the fast, low-cost option. OpenAI is starting with a small group of trusted partners through the API and Codex. According to OpenAI post, they shared the models and plans with the U.S. government first. Broader access in ChatGPT, Codex, and the API is planned in the coming weeks. The change is mostly structural. GPT-5.6 introduces tiered models, two new reasoning modes, and a heavier safety stack. What is GPT-5.6? GPT-5.6 is a family, not a single model. OpenAI also changed how it names releases. The number now marks the generation. The names mark durable capability tiers. Each tier can advance on its own schedule. That gives developers a clearer choice across intelligence, speed, and cost. OpenAI calls Sol its strongest model yet. It cites gains in ...