Posts

Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Training Architecture Achieving 88% Goodput Under High Hardware Failure Rates

Image
Training frontier AI models is, at its core, a coordination problem. Thousands of chips must communicate with each other continuously, synchronizing every gradient update across the network. When one chip fails or even slows down, the entire training run can stall. As models scale toward hundreds of billions of parameters, that fragility becomes increasingly untenable. Google DeepMind is now proposing a different model entirely. Google DeepMind researchers introduced Decoupled DiLoCo (Distributed Low-Communication), a distributed training architecture that decouples compute into asynchronous, fault-isolated ‘islands,’ enabling large language model pre-training across geographically distant data centers without requiring the tight synchronization that makes conventional approaches brittle at scale. The Problem with Traditional Distributed Training To understand why Decoupled DiLoCo is important, it helps to understand how distributed training typically works. Standard Data-Parallel ...

OpenAI Releases GPT-5.5, a Fully Retrained Agentic Model That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval

Image
OpenAI has released GPT-5.5, its most capable model to date and the first fully retrained base model since GPT-4.5. GPT-5.5 is designed to complete complex, multi-step computer tasks with minimal human direction. Think of it as the difference between an assistant who needs a checklist and one who understands the underlying goal and figures out the steps themselves. The release is rolling out today to Plus, Pro, Business, and Enterprise subscribers across ChatGPT and Codex. What ‘Agentic’ Actually Means Here An agentic model doesn’t just respond to a single prompt — it takes a sequence of actions, uses tools (like browsing the web, writing code, running scripts, or operating software), checks its own work, and keeps going until the task is finished. Prior models often stalled at handoff points, requiring the user to re-prompt or correct course. GPT-5.5 is built to reduce those interruptions. OpenAI launched GPT-5.5 as a model targeted at agentic computer use — it writes and debugs co...

A Coding Tutorial on OpenMythos on Recurrent-Depth Transformers with Depth Extrapolation, Adaptive Computation, and Mixture-of-Experts Routing

In this tutorial, we explore the implementation of OpenMythos , a theoretical reconstruction of the Claude Mythos architecture that enables deeper reasoning through iterative computation rather than increased parameter size. We build and analyze models using both GQA and MLA attention mechanisms, examine memory efficiency through KV-cache comparisons, and validate stability via the spectral properties of the recurrent update. We then train the model on a structured parity task and investigate how increasing loop depth at inference improves performance without retraining. Along the way, we also inspect adaptive computation via ACT halting and monitor expert utilization in the MoE layers, providing a comprehensive, hands-on understanding of this emerging architecture. Copy Code Copied Use a different Browser import subprocess, sys try: import open_mythos # noqa: F401 except ImportError: subprocess.check_call([sys.executable, "-m", "pip", "instal...

Google Cloud AI Research Introduces ReasoningBank: A Memory Framework that Distills Reasoning Strategies from Agent Successes and Failures

Image
Most AI agents today have a fundamental amnesia problem. Deploy one to browse the web, resolve GitHub issues, or navigate a shopping platform, and it approaches every single task as if it has never seen anything like it before. No matter how many times it has stumbled on the same type of problem, it repeats the same mistakes. Valuable lessons evaporate the moment a task ends. A team of researchers from Google Cloud AI, the University of Illinois Urbana-Champaign and Yale University introduces ReasoningBank , a memory framework that doesn’t just record what an agent did — it distills why something worked or failed into reusable, generalizable reasoning strategies. The Problem with Existing Agent Memory To understand why ReasoningBank is important, you need to understand what existing agent memory actually does. Two popular approaches are trajectory memory (used in a system called Synapse) and workflow memory (used in Agent Workflow Memory, or AWM). Trajectory memory stores raw actio...

Xiaomi Releases MiMo-V2.5-Pro and MiMo-V2.5: Matching Frontier Model Benchmarks at Significantly Lower Token Cost

Image
Xiaomi MiMo team publicly released two new models: MiMo-V2.5-Pro and MiMo-V2.5 . The benchmarks, combined with some genuinely striking real-world task demos, make a compelling case that open agentic AI is catching up to the frontier faster than most expected. Both models are available immediately via API, and priced competitively. What is an Agentic Model, and Why Does It Matter? Most LLM benchmarks test a model’s ability to answer a single, self-contained question. Agentic benchmarks test something much harder — whether a model can complete a multi-step goal autonomously, using tools (web search, code execution, file I/O, API calls) over many turns, without losing track of the original objective. Think of it as the difference between a model that can answer “how do I write a lexer?” versus one that can actually write a complete compiler , run tests against it, catch regressions, and fix them — all without a human in the loop. The latter is exactly what Xiaomi MiMo team is demonst...

Photon Releases Spectrum: An Open-Source TypeScript Framework that Deploys AI Agents Directly to iMessage, WhatsApp, and Telegram

For all the progress made in AI agent development over the past few years, one fundamental problem has remained largely unsolved: most people never actually interact with agents. They live behind developer dashboards, inside specialized apps that users are asked to download, and within chat interfaces that the majority of the world’s population will never visit. The models are good. The reasoning capabilities are extraordinary. But the distribution is broken. Photon, an infrastructure company focused on reliable, low-latency agent execution and messaging infrastructure, is directly attacking this problem with the launch of Spectrum — an open-source SDK and cloud platform that connects AI agents to the messaging interfaces billions of people already use every day: iMessage, WhatsApp, Telegram, Slack, Discord, Instagram, Phone, and more. Instead of asking users to adopt a new interface to interact with your agent, Spectrum lets you deploy that agent where your users already spend thei...

OpenAI Open-Sources Euphony: A Browser-Based Visualization Tool for Harmony Chat Data and Codex Session Logs

Debugging an AI agent that runs for dozens of steps: reading files, calling APIs, writing code, and revising its own output, is not like debugging a regular function. There is no single stack trace to read. Instead, developers are left staring at hundreds of lines of raw JSON, trying to reconstruct what the model was actually thinking and doing at each step. OpenAI team is taking a direct swing at that problem with the release of Euphony , a new open-source browser-based visualization tool designed to turn structured chat data and Codex session logs into readable, interactive conversation views. Euphony is built specifically around two of OpenAI’s own data formats: Harmony conversations and Codex session JSONL files. What is the Harmony Format? To understand why Euphony exists, you need a quick primer on Harmony. OpenAI’s open-weight model series, gpt-oss , was trained on a specialized prompt format called the harmony response format . Unlike standard chat formats, Harmony suppor...