Posts

OpenAI Releases Three Realtime Audio Models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper in the Realtime API

OpenAI released three new audio models through its Realtime API, each targeting a distinct capability in live voice applications: GPT-Realtime-2 for voice agents with reasoning, GPT-Realtime-Translate for live speech translation, and GPT-Realtime-Whisper for streaming transcription. Alongside the model releases, the Realtime API officially exits beta and is now generally available — a meaningful signal for developers who held off building production systems on it. All three models are available immediately through the OpenAI API and can be tested in the Playground. Together, they push voice applications past the basic question-and-answer loop — toward systems that can listen, reason, translate, transcribe, and act within a single conversation. GPT-Realtime-2: Voice Reasoning with a 128K Context Window The flagship release is GPT-Realtime-2, which OpenAI team describes as its first voice model with GPT-5-class reasoning. GPT-Realtime-2 can process harder requests, manage in...

Build a CloakBrowser Automation Workflow with Stealth Chromium, Persistent Profiles, and Browser Signal Inspection

In this tutorial, we explore CloakBrowser , a Python-friendly browser automation tool that uses Playwright-style APIs within a stealth Chromium environment. We begin by setting up CloakBrowser, preparing the required browser binary, and resolving the common Colab asyncio loop issue by running the sync browser workflow in a separate worker thread. We then move through practical automation steps, including launching a browser, creating customized browser contexts, inspecting browser-visible signals, interacting with a local test page, saving session state, restoring localStorage, using persistent browser profiles, capturing screenshots, and extracting rendered page content for parsing. Copy Code Copied Use a different Browser import os import sys import json import time import shutil import base64 import subprocess import concurrent.futures from pathlib import Path from datetime import datetime from textwrap import dedent def run_cmd(cmd, check=True, capture=False): print(f...

LightSeek Foundation Releases TokenSpeed, an Open-Source LLM Inference Engine Targeting TensorRT-LLM-Level Performance for Agentic Workloads

Image
Inference efficiency has quietly become one of the most consequential bottlenecks in AI deployment. As agentic coding systems such as Claude Code, Codex, and Cursor scale from developer tools to infrastructure powering software development at large, the underlying inference engines serving those requests are under increasing strain. The LightSeek Foundation researchers have released TokenSpeed , an open-source LLM inference engine released under the MIT license and designed specifically for the demands of agentic workloads. The TokenSpeed engine is currently in preview status. Why Agentic Inference is a Different Problem To understand what makes TokenSpeed’s design choices meaningful, it helps to understand what makes agentic inference hard. Coding agents don’t behave like a typical chatbot turn. Contexts routinely exceed 50K tokens, and conversations often span dozens of turns. This creates simultaneous pressure on two metrics: per-GPU TPM (tokens per minute), wh...

Meta AI Releases NeuralBench: A Unified Open-Source Framework to Benchmark NeuroAI Models Across 36 EEG Tasks and 94 Datasets

Image
Evaluating AI models trained on brain signals has long been a messy, inconsistent topic. Different research groups use different preprocessing pipelines, train models on different datasets, and report results on a narrow set of tasks — making it nearly impossible to know which model actually works best, or for what. A new framework from Meta AI team is designed to fix that. Meta Researchers have released NeuralBench , a unified, open-source framework for benchmarking AI models of brain activity. Its first release, NeuralBench-EEG v1.0 , is the largest open benchmark of its kind: 36 downstream tasks, 94 datasets, 9,478 subjects, 13,603 hours of electroencephalography (EEG) data, and 14 deep learning architectures evaluated under a single standardized interface. https://ift.tt/yCqbYBP The Problem NeuralBench Solves The broader field of NeuroAI where deep learning meets neuroscience has exploded in recent years. Self-supervised learning techniques originally developed for...

OpenAI Introduces MRC (Multipath Reliable Connection): A New Open Networking Protocol for Large-Scale AI Supercomputer Training Clusters

Training frontier AI models is not just a compute problem — it is increasingly a networking problem. And OpenAI just introduced its solution. OpenAI announced the release of MRC (Multipath Reliable Connection) , a novel networking protocol developed over the past two years in partnership with AMD, Broadcom, Intel, Microsoft, and NVIDIA. The specification was published through the Open Compute Project (OCP), enabling the broader industry to use and build on it. Why Networking is the Hidden Bottleneck in AI Training To understand why MRC matters, you need to understand what happens inside a supercomputer during model training. When training large AI models, a single step can involve many millions of data transfers. One transfer arriving late can ripple through the entire job, potentially causing GPUs to sit idle. Network congestion, link, and device failures are the most common sources of delay and jitter in transfers — and these problems get more frequent, and harder to...

Zyphra Releases ZAYA1-8B: A Reasoning MoE Trained on AMD Hardware That Punches Far Above Its Weight Class

Image
Zyphra AI has released ZAYA1-8B, a small Mixture of Experts (MoE) language model with 760 million active parameters and 8.4 billion total parameters. Trained end-to-end on AMD hardware, the model outperforms open-weight models many times its size on math and coding benchmarks, and is now available under an Apache 2.0 license on Hugging Face and as a serverless endpoint on Zyphra Cloud. With under 1 billion active parameters, ZAYA1-8B achieves scores competitive with first-generation frontier reasoning models like DeepSeek-R1-0528, Gemini-2.5-Pro, and Claude 4.5 Sonnet on challenging mathematical reasoning tasks. With its novel test-time compute methodology called Markovian RSA, it surpasses Claude 4.5 Sonnet and GPT-5-High on HMMT’25 (89.6 vs 88.3) and closes in on frontier open-weight models like DeepSeek-V3.2 on mathematics benchmarks. What is a Mixture of Experts Model and Why Does Active Parameter Count Matter? The distinction between ‘active’ and ...

A Groq-Powered Agentic Research Assistant with LangGraph, Tool Calling, Sub-Agents, and Agentic Memory: Lets Built It

Image
In this tutorial, we build a Groq -powered agentic research workflow that runs directly using Groq’s free OpenAI-compatible inference endpoint. We configure LangChain’s ChatOpenAI interface to work with Groq by setting the Groq API key and base URL, allowing us to use fast hosted models such as llama-3.3-70b-versatile for tool-based reasoning. We then connect the model with practical tools for web search, webpage fetching, file handling, Python execution, skill loading, sub-agent delegation, and long-term memory. By the end of the tutorial, we have a working Groq-based multi-step agent that can research a topic, delegate focused subtasks, generate structured outputs, and save useful information for later runs. Copy Code Copied Use a different Browser import subprocess, sys def _pip(*a): subprocess.check_call([sys.executable,"-m","pip","install","-q",*a]) _pip("langgraph>=0.2.50", "langchain>=0.3.0", "langchain-...