Posts

How to Use NVIDIA Canary-1B-v2 for ASR, Translation, and Automatic SRT Subtitle Export in Python

Image
In this tutorial , we build a speech recognition and translation workflow using NVIDIA Canary-1B-v2 . We begin by setting up the required audio, NeMo, NumPy, and SciPy dependencies, then load the Canary model on a GPU-enabled runtime for efficient inference. From there, we prepare audio into a clean 16 kHz mono format, perform English ASR, translate speech into multiple languages, generate word and segment timestamps, export translated subtitles as an SRT file, test long-form transcription, run batch processing, and benchmark inference speed. At the end, we have a complete multilingual ASR and speech translation pipeline that we can adapt for real audio files, subtitle generation, and large-scale transcription experiments. Installing NeMo, Audio Libraries, NumPy, and SciPy Dependencies Copy Code Copied Use a different Browser import os, subprocess, sys SENTINEL = "/content/.canary_setup_done" if not os.path.exists(SENTINEL): def sh(c): print("$",...

Prime Intellect Releases prime-rl 0.6.0 to Train Trillion-Parameter MoE Models on Agentic RL Workloads

Image
Prime Intellect has released prime-rl version 0.6.0 . The framework targets reinforcement learning on trillion-parameter Mixture-of-Experts (MoE) models. It focuses on heavy agentic workloads, like long-horizon software-engineering tasks. The research team trained GLM-5 on SWE tasks at up to 131k sequence length. Step times stayed under five minutes. The batch size was 256 rollouts. The run used only 28 H200 nodes. TL;DR prime-rl 0.6.0 trains trillion-parameter MoE models on agentic RL workloads. GLM-5 trained on SWE at 131k sequence length, sub-5-minute steps, 28 H200 nodes. Asynchronous RL disaggregates trainer and inference for independent optimization. Inference uses FP8, Wide EP, P/D disaggregation, KV offloading, and router replay. Training uses 3-D parallelism (FSDP, EP, CP) plus block-scaled FP8. What is prime-rl 0.6.0? prime-rl is an open framework for asynchronous reinforcement learning. It post-trains large open-source models on agentic...

GLM-5.2 OpenAI-Compatible API: A Hands-On Guide to Reasoning Effort, Function Calling, and Long-Context Retrieval

In this tutorial , we work with GLM-5.2 and use its hosted, OpenAI-compatible API instead of running the full model locally. We begin by setting up multiple provider options, securely loading the API key, and creating a reusable chat wrapper that supports normal chat, thinking mode, streaming, tool calling, and token tracking. Then we move beyond a simple chatbot example and test the model in more practical situations, including reasoning-effort control, streamed reasoning and answers, function calling, a small tool-using agent, structured JSON output, long-context retrieval, and cost estimation.  Setting Up the GLM-5.2 OpenAI-Compatible Client and Reusable Chat Wrapper Copy Code Copied Use a different Browser import sys, subprocess subprocess.run([sys.executable, "-m", "pip", "install", "-q", "-U", "openai"], check=False) import os, re, json, time, getpass from openai import OpenAI PROVIDERS = { "zai":...

Sakana AI Launches Sakana Fugu: An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs

Today, Sakana AI launched Sakana Fugu . It is a multi-agent orchestration system that behaves like one model. You send a request to a single endpoint. Fugu decides how to handle it internally. It solves a task directly when that is enough. It also assembles and coordinates a team of expert models when needed. The complexity of a multi-agent system never reaches your code. TL;DR Fugu delivers a multi-agent system behind one OpenAI-compatible API. Fugu Ultra leads most published coding and reasoning benchmarks. The orchestrator beats the individual models it coordinates. Opt-out and provider routing target compliance and single-vendor risk. Routing is proprietary, so per-query model selection stays hidden. What is Sakana Fugu Fugu is itself a language model. It is trained to call other LLMs in an agent pool. That pool includes instances of itself, called recursively. Fugu manages model selection, delegation, verification, and synthesis internally. I...