Posts

OpenAGI Foundation Launches Lux: A Foundation Computer Use Model that Tops Online Mind2Web with OSGym At Scale

Image
How do you turn slow, manual click work across browsers and desktops into a reliable, automated system that can actually use a computer for you at scale? Lux is the latest example of computer use agents moving from research demo to infrastructure. OpenAGI Foundation team has released Lux , a foundation model that operates real desktops and browsers and reports a score of 83.6 on the Online Mind2Web benchmark, which covers more than 300 real world computer use tasks. This is ahead of Google Gemini CUA at 69.0, OpenAI Operator at 61.3 and Anthropic Claude Sonnet 4 at 61.0. https://ift.tt/4a9VUEL What Lux Actually Does? Lux is a computer use model, not a chat model with a browser plugin. It takes a natural language goal, views the screen, and outputs low level actions such as clicks, key presses and scroll events. It can drive browsers, editors, spreadsheets, email clients and other desktop applications because it works on rendered UI, not on application specific APIs. From a deve...

Apple Researchers Release CLaRa: A Continuous Latent Reasoning Framework for Compression‑Native RAG with 16x–128x Semantic Document Compression

Image
How do you keep RAG systems accurate and efficient when every query tries to stuff thousands of tokens into the context window and the retriever and generator are still optimized as 2 separate, disconnected systems? A team of researchers from Apple and University of Edinburgh released CLaRa, Continuous Latent Reasoning, (CLaRa-7B-Base, CLaRa-7B-Instruct and CLaRa-7B-E2E) a retrieval augmented generation framework that compresses documents into continuous memory tokens and then performs both retrieval and generation in that shared latent space. The goal is simple. Shorten context, avoid double encoding, and let the generator teach the retriever what actually matters for downstream answers. https://ift.tt/p6qXosd From raw documents to continuous memory tokens CLaRa starts with a semantic compressor that attaches a small number of learned memory tokens to each document. During Salient Compressor Pretraining, SCP, the base model is a Mistral 7B style transformer with LoRA adapters...

How to Build a Meta-Cognitive AI Agent That Dynamically Adjusts Its Own Reasoning Depth for Efficient Problem Solving

In this tutorial, we build an advanced meta-cognitive control agent that learns how to regulate its own depth of thinking. We treat reasoning as a spectrum, ranging from fast heuristics to deep chain-of-thought to precise tool-like solving, and we train a neural meta-controller to decide which mode to use for each task. By optimizing the trade-off between accuracy, computation cost, and a limited reasoning budget, we explore how an agent can monitor its internal state and adapt its reasoning strategy in real time. Through each snippet, we experiment, observe patterns, and understand how meta-cognition emerges when an agent learns to think about its own thinking. Check out the  FULL CODE NOTEBOOK . Copy Code Copied Use a different Browser import random import numpy as np import torch import torch.nn as nn import torch.optim as optim OPS = ['+', '*'] def make_task(): op = random.choice(OPS) if op == '+': a, b = random.randint(1, 99), ...

NVIDIA and Mistral AI Bring 10x Faster Inference for the Mistral 3 Family on GB200 NVL72 GPU Systems

Image
NVIDIA announced today a significant expansion of its strategic collaboration with Mistral AI. This partnership coincides with the release of the new Mistral 3 frontier open model family, marking a pivotal moment where h ardware acceleration and open-source model architecture have converged to redefine performance benchmarks. This collaboration is a massive leap in inference speed: the new models now run up to 10x faster on NVIDIA GB200 NVL72 systems compared to the previous generation H200 systems. This breakthrough unlocks unprecedented efficiency for enterprise-grade AI, promising to solve the latency and cost bottlenecks that have historically plagued the large-scale deployment of reasoning models. A Generational Leap: 10x Faster on Blackwell As enterprise demand shifts from simple chatbots to high-reasoning, long-context agents, inference efficiency has become the critical bottleneck. The collaboration between NVIDIA and Mistral AI addresses this head-on by optimizing the Mi...

Google DeepMind Researchers Introduce Evo-Memory Benchmark and ReMem Framework for Experience Reuse in LLM Agents

Image
Large language model agents are starting to store everything they see, but can they actually improve their policies at test time from those experiences rather than just replaying context windows? Researchers from University of Illinois Urbana Champaign and Google DeepMind propose Evo-Memory , a streaming benchmark and agent framework that targets this exact gap. Evo-Memory evaluates test-time learning with self-evolving memory , asking whether agents can accumulate and reuse strategies from continuous task streams instead of relying only on static conversational logs. https://ift.tt/TUxZDs1 Conversational Recall vs Experience Reuse Most current agents implement conversational recall . They store dialogue history, tool traces, and retrieved documents, which are then reintegrated into the context window for future queries. This type of memory serves as a passive buffer, capable of recovering facts or recalling previous steps, but it does not actively modify the agent’s approach ...

DeepSeek Researchers Introduce DeepSeek-V3.2 and DeepSeek-V3.2-Speciale for Long Context Reasoning and Agentic Workloads

Image
How do you get GPT-5-level reasoning on real long-context, tool-using workloads without paying the quadratic attention and GPU cost that usually makes those systems impractical? DeepSeek research introduces DeepSeek-V3.2 and DeepSeek-V3.2-Speciale . They are reasoning-first models built for agents and targets high quality reasoning, long context and agent workflows, with open weights and production APIs. The models combine DeepSeek Sparse Attention (DSA), a scaled GRPO reinforcement learning stack and an agent native tool protocol, and report performance comparable to GPT 5, with DeepSeek-V3.2-Speciale reaching Gemini 3.0 Pro level reasoning on public benchmarks and competitions. https://ift.tt/VtveRYD Sparse Attention with Near Linear Long Context Cost Both DeepSeek-V3.2 and DeepSeek-V3.2-Speciale use the DeepSeek-V3 Mixture of Experts transformer with about 671B total parameters and 37B active parameters per token, inherited from V3.1 Terminus. The only structural change is ...