Posts

Google Cloud AI Research Introduces ReasoningBank: A Memory Framework that Distills Reasoning Strategies from Agent Successes and Failures

Image
Most AI agents today have a fundamental amnesia problem. Deploy one to browse the web, resolve GitHub issues, or navigate a shopping platform, and it approaches every single task as if it has never seen anything like it before. No matter how many times it has stumbled on the same type of problem, it repeats the same mistakes. Valuable lessons evaporate the moment a task ends. A team of researchers from Google Cloud AI, the University of Illinois Urbana-Champaign and Yale University introduces ReasoningBank , a memory framework that doesn’t just record what an agent did — it distills why something worked or failed into reusable, generalizable reasoning strategies. The Problem with Existing Agent Memory To understand why ReasoningBank is important, you need to understand what existing agent memory actually does. Two popular approaches are trajectory memory (used in a system called Synapse) and workflow memory (used in Agent Workflow Memory, or AWM). Trajectory memory stores raw actio...

Xiaomi Releases MiMo-V2.5-Pro and MiMo-V2.5: Matching Frontier Model Benchmarks at Significantly Lower Token Cost

Image
Xiaomi MiMo team publicly released two new models: MiMo-V2.5-Pro and MiMo-V2.5 . The benchmarks, combined with some genuinely striking real-world task demos, make a compelling case that open agentic AI is catching up to the frontier faster than most expected. Both models are available immediately via API, and priced competitively. What is an Agentic Model, and Why Does It Matter? Most LLM benchmarks test a model’s ability to answer a single, self-contained question. Agentic benchmarks test something much harder — whether a model can complete a multi-step goal autonomously, using tools (web search, code execution, file I/O, API calls) over many turns, without losing track of the original objective. Think of it as the difference between a model that can answer “how do I write a lexer?” versus one that can actually write a complete compiler , run tests against it, catch regressions, and fix them — all without a human in the loop. The latter is exactly what Xiaomi MiMo team is demonst...

Photon Releases Spectrum: An Open-Source TypeScript Framework that Deploys AI Agents Directly to iMessage, WhatsApp, and Telegram

For all the progress made in AI agent development over the past few years, one fundamental problem has remained largely unsolved: most people never actually interact with agents. They live behind developer dashboards, inside specialized apps that users are asked to download, and within chat interfaces that the majority of the world’s population will never visit. The models are good. The reasoning capabilities are extraordinary. But the distribution is broken. Photon, an infrastructure company focused on reliable, low-latency agent execution and messaging infrastructure, is directly attacking this problem with the launch of Spectrum — an open-source SDK and cloud platform that connects AI agents to the messaging interfaces billions of people already use every day: iMessage, WhatsApp, Telegram, Slack, Discord, Instagram, Phone, and more. Instead of asking users to adopt a new interface to interact with your agent, Spectrum lets you deploy that agent where your users already spend thei...

OpenAI Open-Sources Euphony: A Browser-Based Visualization Tool for Harmony Chat Data and Codex Session Logs

Debugging an AI agent that runs for dozens of steps: reading files, calling APIs, writing code, and revising its own output, is not like debugging a regular function. There is no single stack trace to read. Instead, developers are left staring at hundreds of lines of raw JSON, trying to reconstruct what the model was actually thinking and doing at each step. OpenAI team is taking a direct swing at that problem with the release of Euphony , a new open-source browser-based visualization tool designed to turn structured chat data and Codex session logs into readable, interactive conversation views. Euphony is built specifically around two of OpenAI’s own data formats: Harmony conversations and Codex session JSONL files. What is the Harmony Format? To understand why Euphony exists, you need a quick primer on Harmony. OpenAI’s open-weight model series, gpt-oss , was trained on a specialized prompt format called the harmony response format . Unlike standard chat formats, Harmony suppor...

A Coding Implementation to Build a Conditional Bayesian Hyperparameter Optimization Pipeline with Hyperopt, TPE, and Early Stopping

In this tutorial, we implement an advanced Bayesian hyperparameter optimization workflow using Hyperopt and the Tree-structured Parzen Estimator (TPE) algorithm. We construct a conditional search space that dynamically switches between different model families, demonstrating how Hyperopt handles hierarchical and structured parameter graphs. We build a production-grade objective function using cross-validation inside a scikit-learn pipeline, enabling realistic model evaluation. We also incorporate early stopping based on stagnating loss improvements and fully inspect the Trials object to analyze optimization trajectories. By the end of this tutorial, we not only find the best model configuration but also understand how Hyperopt internally tracks, evaluates, and refines the search process. It creates a scalable and reproducible hyperparameter tuning framework that can be extended to deep learning or distributed settings. Copy Code Copied Use a different Browser !pip -q instal...

Google Introduces Simula: A Reasoning-First Framework for Generating Controllable, Scalable Synthetic Datasets Across Specialized AI Domains

Training powerful AI models depends on one resource that is quietly running out: specialized data. While the internet provided a seemingly infinite supply of text and images to train today’s generalist models, the next wave of AI breakthroughs — in cybersecurity, legal reasoning, healthcare, and other niche domains — requires data that simply doesn’t exist in sufficient volume, or can’t be accessed due to privacy concerns. A team of researchers from Google and EPFL introduce Simula , a reasoning-driven framework for synthetic data generation and evaluation that prioritizes transparency, fine-grained control, and scalability. Unlike conventional approaches, Simula doesn’t rely on seed data from the target distribution, hand-crafted prompts, or evolutionary algorithms — it constructs each dataset from first principles, treating data generation as a problem of mechanism design. Why Synthetic Data Generation is Harder Than It Looks If you’ve worked with fine-tuning pipelines or domain-s...

A Coding Implementation on Qwen 3.6-35B-A3B Covering Multimodal Inference, Thinking Control, Tool Calling, MoE Routing, RAG, and Session Persistence

In this tutorial, we build an end-to-end implementation around Qwen 3.6-35B-A3B and explore how a modern multimodal MoE model can be used in practical workflows. We begin by setting up the environment, loading the model adaptively based on available GPU memory, and creating a reusable chat framework that supports both standard responses and explicit thinking traces. From there, we work through important capabilities such as thinking-budget control, streamed generation with separated reasoning and answers, vision input handling, tool calling, structured JSON generation, MoE routing inspection, benchmarking, retrieval-augmented generation, and session persistence. Through this process, we run the model for inference and also examine how to design a robust application layer on top of Qwen 3.6 for real experimentation and advanced prototyping. Copy Code Copied Use a different Browser import subprocess, sys def _pip(*a): subprocess.check_call([sys.executable, "-m", ...