Posts

Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API

Google just announced Gemini 3.5 Live Translate . It is their latest audio model for live speech-to-speech translation. Speech-to-speech means spoken audio goes in, and translated spoken audio comes out. The model detects over 70 languages automatically and generates translated speech. It preserves the speaker’s intonation, pacing, and pitch in the output. Turn-by-turn systems wait for a speaker to finish before responding. Gemini 3.5 Live Translate generates speech continuously instead. It balances a trade-off between waiting for context and translating immediately. More context improves quality. Faster output keeps the translation in sync with the speaker. The result stays a few seconds behind the speaker throughout a session. Gemini 3.5 Live Translate Gemini 3.5 Live Translate is a single audio model ( gemini-3.5-live-translate-preview ), not a chat assistant. It processes speech as the audio streams in, rather than after a full sentence. It handles multilingual inputs...

NVIDIA cuTile Python Tutorial: Building Tiled GPU Kernels for Vector Addition, Matrix Addition, and Matrix Multiplication in Colab

In this tutorial, we implement an advanced hands-on workflow for NVIDIA cuTile Python, a tile-based GPU programming interface for writing efficient CUDA-style kernels directly in Python. We start by preparing a Colab-friendly environment, checking the available GPU, driver, CUDA, and cuTile installations before running any kernel code. We then build tiled examples for vector addition, matrix addition, and matrix multiplication, while keeping a PyTorch fallback. Hence, the notebook remains executable even when Colab does not meet cuTile’s latest runtime requirements. Through this approach, we understand how tiled programming works, how tensors are loaded, computed, stored, and validated, and how custom GPU kernels can be compared against standard PyTorch operations. Setting Up NVIDIA cuTile Python and Checking GPU, CUDA, and Driver Runtime in Colab Copy Code Copied Use a different Browser import os import sys import math import time import json import shutil import subprocess...

A New Study from Harvard and Perplexity Finds AI Agents Perform 26 Minutes of Autonomous Work per Session vs 33 Seconds for Search

Image
A new working research from Perplexity and Harvard offers field evidence on what AI agents do to knowledge work. It draws on production data from two Perplexity products: Search and Computer. The setup is a natural comparison. Search is a conversational answer engine. Computer is an agent that plans and executes tasks end to end. The same users touch both products, so the team can hold the task roughly constant. What the Study Actually Measures The research study covers a 90-day window, February 27 through May 27, 2026. Computer launched two days before that window opened. The core method matches near-identical query pairs across the two products. The research team found 10,000 session pairs with cosine similarity above 0.99. Each pair is effectively the same task attempted both ways. Computer pairs are gated to sessions that invoke an execution tool. These ‘do’ tools include code execution, browser actions, file writes, and connector calls. That gate en...

ClawHub Security Signals: A Coding Guide to End-to-End Security Signal Analysis and Verdict Classification on the AI Skills Dataset

In this tutorial, we use the ClawHub Security Signals dataset to examine how different security scanners assess AI skills and related files. We load the dataset directly from the Hugging Face Parquet conversion to avoid compatibility issues with newer dataset metadata, then inspect the main columns, verdict distribution, scanner outputs, and severity labels. After exploring scanner disagreement and overlap patterns, we build a practical machine learning pipeline that combines SKILL.md text with numerical scanner signals to predict the final ClawScan verdict. It gives us a complete workflow for loading, analyzing, visualizing, and modeling security signal data in a Colab-ready environment. Setting Up the Colab Environment and Imports for Security Signal Analysis Copy Code Copied Use a different Browser !pip -q install -U "huggingface_hub>=0.23" pyarrow scikit-learn pandas numpy matplotlib seaborn import warnings, numpy as np, pandas as pd warnings.filterwarnings(...

Xiaomi MiMo and TileRT Push a 1-Trillion-Parameter Model Past 1000 Tokens Per Second on Commodity GPUs

Inference speed is becoming a competitive metric for large language models. Xiaomi’s MiMo team just released MiMo-V2.5-Pro-UltraSpeed, built in collaboration with the TileRT systems group. It decodes faster than 1000 tokens per second on a 1-trillion-parameter model. Xiaomi team describes this as a first at trillion-parameter scale. Demos show generation peaks near 1200 tokens per second. The notable part is the hardware: it runs on commodity GPUs, not custom silicon. What is MiMo-V2.5-Pro-UltraSpeed UltraSpeed is a high-speed serving mode for the existing MiMo-V2.5-Pro model. The base model uses a Mixture-of-Experts (MoE) architecture at trillion-parameter scale. UltraSpeed targets generation speed rather than model capability. It changes how fast the model produces output tokens. The speedup comes from three coordinated techniques across the model and the serving system. Xiaomi calls this approach extreme model-system codesign. Crucially, the entire stack runs on a sing...