Posts

Microsoft Research Releases Webwright: A Terminal-Native Web Agent Framework That Scores 60.1% on Odysseys, Up from Base GPT-5.4’s 33.5%

Image
Most web agents today drive a browser one action at a time. The model receives the current page state — as a screenshot or DOM text — and predicts the next click, keypress, or scroll. This action-at-a-time design made sense when language models had limited reasoning ability. As models have become more capable at writing and debugging code, that rigid loop has become a constraint rather than a structure that helps. Microsoft Research’s AI Frontiers lab built a different approach. Their new open-source framework, Webwright , gives the agent a terminal instead of a stateful browser session. The agent writes Playwright code to control browsers, runs bash commands, inspects logs, and iteratively refines scripts. Playwright is an open-source browser automation library, also from Microsoft, that supports programmatic control of Chromium, Firefox, and WebKit browsers. What Webwright Does Differently Webwright separates the agent from the browser and treats the browser as som...

NVIDIA AI Releases Gated DeltaNet-2: A Linear Attention Layer That Decouples Erase and Write in the Delta Rule

Image
Linear attention replaces the unbounded KV cache of softmax attention with a fixed-size recurrent state. This cuts sequence mixing to linear time and decoding to constant memory. The hard part is not what to forget. It is how to edit a compressed memory without scrambling existing associations. NVIDIA has released Gated DeltaNet-2 , a linear attention layer that targets that bottleneck. The model decouples the active memory edit into two channel-wise gates. It is trained at 1.3B parameters on 100B FineWeb-Edu tokens. It outperforms Mamba-2, Gated DeltaNet, KDA, and Mamba-3 across the researchs benchmark suite. The scalar gate problem in delta-rule models A recurrent linear attention layer stores a matrix state S t and reads it with the query. DeltaNet adds an active edit by subtracting the value currently associated with the current key. It uses a scalar step size β t to control how much to overwrite. Mamba-2 adds a data-dependent scalar decay α t for global forgetting....

Build a SuperClaude Framework Workflow with Commands, Agents, Modes, and Session Memory

Image
In this tutorial, we build an advanced workflow using the SuperClaude Framework as a structured layer on top of the Anthropic API. We clone the framework, discover its commands, agents, and modes, and create a Python bridge that dynamically loads the relevant Markdown behavior files into the system prompt before each model call. Through practical examples, we explore brainstorming, frontend implementation, security analysis, business strategy, deep research planning, token-efficient responses, and a chained multi-step development workflow with session save and load support. We also learn how these reusable framework assets make our prompts more consistent, role-aware, and suitable for complex AI-assisted software development tasks. Copy Code Copied Use a different Browser import subprocess, sys, os, json, textwrap, getpass, time from pathlib import Path def _pip(pkg): subprocess.run([sys.executable, "-m", "pip", "install", "-q", pkg], c...

Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE Training or Weight Modification

Image
Instruction-tuned language models refuse harmful requests. But which part of the model is actually responsible — and how does that mechanism get installed during training? A new research from Nous Research team takes a neuron-level look at this question. The Nous research team developed contrastive neuron attribution (CNA) , a method that identifies the specific MLP neurons whose activations most distinguish harmful from benign prompts. By ablating just 0.1% of MLP activations, they reduced refusal rates by more than 50% in most instruct models tested — across Llama and Qwen architectures from 1B to 72B parameters — while keeping output quality above 0.97 at all steering strengths. What’s interesting is a key finding: the late-layer structure that discriminates harmful from benign prompts exists in base models before any fine-tuning. Alignment fine-tuning does not create new structure. It transforms the function of neurons within that existing structure into a sparse, targetable ...