Posts

Qwen Team Open-Sources Qwen3.6-35B-A3B: A Sparse MoE Vision-Language Model with 3B Active Parameters and Agentic Coding Capabilities

Image
The open-source AI landscape has a new entry worth paying attention to. The Qwen team at Alibaba has released Qwen3.6-35B-A3B, the first open-weight model from the Qwen3.6 generation, and it is making a compelling argument that parameter efficiency matters far more than raw model size. With 35 billion total parameters but only 3 billion activated during inference, this model delivers agentic coding performance competitive with dense models that are ten times its active size. What is a Sparse MoE Model, and Why Does it Matter Here? A Mixture of Experts (MoE) model does not run all of its parameters on every forward pass. Instead, the model routes each input token through a small subset of specialized sub-networks called ‘experts.’ The rest of the parameters sit idle. This means you can have an enormous total parameter count while keeping inference compute — and therefore inference cost and latency — proportional only to the active parameter count. Qwen3.6-35B-A3B is a Causal Language...

OpenAI Launches GPT-Rosalind: Its First Life Sciences AI Model Built to Accelerate Drug Discovery and Genomics Research

Drug discovery is one of the most expensive and time-consuming endeavors in human history. It takes roughly 10 to 15 years to go from target discovery to regulatory approval for a new drug in the United States. Most of that time is spent not in breakthrough moments, but in painstaking analytical work — sifting through mountains of literature, designing reagents, and interpreting complex biological data. OpenAI believes AI can help compress those timelines, and today it introduced its most specialized model yet to prove it. OpenAI introduces GPT-Rosalind — it’s first model in a new Life Sciences series — to deliver stronger foundational reasoning in fields like biochemistry and genomics. Unlike general-purpose language models that are trained broadly across all domains, GPT-Rosalind is fine-tuned specifically for the deep analytical demands of biological research. The model is definitely not intended to replace scientists, but rather to help them move faster through some of the most t...

Building Transformer-Based NQS for Frustrated Spin Systems with NetKet

The intersection of many-body physics and deep learning has opened a new frontier: Neural Quantum States (NQS) . While traditional methods struggle with high-dimensional frustrated systems, the global attention mechanism of Transformers provides a powerful tool for capturing complex quantum correlations. In this tutorial, we implement a research-grade Variational Monte Carlo (VMC) pipeline using NetKet and JAX to solve the frustrated J1–J2 Heisenberg spin chain . We will: Build a custom Transformer-based NQS architecture. Optimize the wavefunction using Stochastic Reconfiguration (natural gradient descent). Benchmark our results against exact diagonalization and analyze emergent quantum phases. By the end of this guide, you will have a scalable, physically grounded simulation framework capable of exploring quantum magnetism beyond the reach of classical exact methods. Copy Code Copied Use a different Browser !pip -q install --upgrade pip !pip -q install ...

UCSD and Together AI Research Introduces Parcae: A Stable Architecture for Looped Language Models That Achieves the Quality of a Transformer Twice the Size

Image
The dominant recipe for building better language models has not changed much since the Chinchilla era: spend more FLOPs, add more parameters, train on more tokens. But as inference deployments consume an ever-growing share of compute and model deployments push toward the edge, researchers are increasingly asking a harder question — can you scale quality without scaling memory footprint ? A team of researchers from UC San Diego and Together AI have introduced Parcae , a stable looped transformer architecture that outperforms prior looped models and beats fixed-depth Transformer baselines at every scale tested — all while using the same parameter count and the same training data budget https://ift.tt/3sBfPHQ What is a Looped Language Model? In a standard Transformer, activations flow through a fixed stack of layers exactly once. A looped architecture instead routes activations through a block of layers T times in a loop, multiplying effective compute without adding parameters....

A Coding Implementation to Build Multi-Agent AI Systems with SmolAgents Using Code Execution, Tool Calling, and Dynamic Orchestration

Image
In this tutorial, we build an advanced, production-ready agentic system using SmolAgents and demonstrate how modern, lightweight AI agents can reason, execute code, dynamically manage tools, and collaborate across multiple agents. We start by installing dependencies and configuring a powerful yet efficient LLM backend, and then progressively design custom tools, including mathematical utilities, memory storage, and web search capabilities. We explore both CodeAgent and ToolCallingAgent paradigms, understand how tools are managed dynamically through the agent.tools dictionary, and implement multi-agent orchestration. Copy Code Copied Use a different Browser import subprocess, sys def pip(*args): subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", *args]) pip("smolagents[all]", "duckduckgo-search", "wikipedia", "rich") import os, math, textwrap from rich.console import Co...

Google AI Launches Gemini 3.1 Flash TTS: A New Benchmark in Expressive and Controllable AI Voice

Google has introduced Gemini 3.1 Flash TTS , a preview text-to-speech model focused on improving speech quality, expressive control, and multilingual generation. Unlike previous iterations that prioritized simple conversion, this release emphasizes natural-language audio tags, native support for more than 70 languages, and native multi-speaker dialogue. This release signals a shift from ‘black-box’ audio generation toward a more granular, instruction-based workflow. The model is rolling out in preview through the Gemini API and Google AI Studio, on Vertex AI for enterprises, and via Google Vids for Workspace users. Speech Quality, Control, and Developer Workflow The standout technical achievement of Gemini 3.1 Flash TTS is its performance on industry benchmarks. The model currently reports an Artificial Analysis TTS leaderboard Elo score of 1,211 , positioning it as Google’s most natural and expressive speech model to date. Beyond raw quality, the update introduces a more sophistic...

Google DeepMind Releases Gemini Robotics-ER 1.6: Bringing Enhanced Embodied Reasoning and Instrument Reading to Physical AI

Image
Google DeepMind research team introduced Gemini Robotics-ER 1.6, a significant upgrade to its embodied reasoning model designed to serve as the ‘cognitive brain’ of robots operating in real-world environments. The model specializes in reasoning capabilities critical for robotics, including visual and spatial understanding, task planning, and success detection — acting as the high-level reasoning model for a robot, capable of executing tasks by natively calling tools like Google Search, vision-language-action models (VLAs), or any other third-party user-defined functions. Here is the key architectural idea to understand: Google DeepMind takes a dual-model approach to robotics AI. Gemini Robotics 1.5 is the vision-language-action (VLA) model — it processes visual inputs and user prompts and directly translates them into physical motor commands. Gemini Robotics-ER, on the other hand, is the embodied reasoning model: it specializes in understanding physical spaces, planning, and making lo...