Posts

Design a Complete Multimodal RLVR Pipeline with Open-MM-RL, Vision-Language Prompting, Reward Scoring, and GRPO Export

In this tutorial, we explore the TuringEnterprises/Open-MM-RL dataset as a practical foundation for multimodal reasoning and reinforcement learning with verifiable rewards. We load the dataset, inspect its schema, analyze domains, formats, question lengths, answer types, and image distributions, and visualize representative examples from each domain. We also build a lightweight reward function that checks exact, numeric, fractional, LaTeX, and symbolic answers, giving us a useful way to evaluate model outputs. Finally, we format prompts for vision-language models, optionally test SmolVLM on sample examples, and export the dataset into a GRPO-style structure for future multimodal RL training. Copy Code Copied Use a different Browser import subprocess, sys subprocess.run([sys.executable, "-m", "pip", "-q", "install", "datasets>=3.0", "huggingface_hub>=0.24", "transformers>=4.45", ...

Best Authentication Platforms for AI Agents and MCP Servers in 2026

The Model Context Protocol has moved from Anthropic’s internal experiment to a de facto industry standard at a speed few integration protocols have matched. Since its launch in November 2024, MCP has grown explosively: OpenAI adopted it in March 2025, Microsoft announced support in Copilot Studio in March 2025, and by late 2025 combined Python and TypeScript SDK downloads had crossed 97 million monthly . In December 2025, Anthropic donated MCP to the Agentic AI Foundation under the Linux Foundation. Gartner projects that up to 40% of enterprise applications will include integrated task-specific AI agents by the end of 2026, up from less than 5% today. That growth has made authentication the central unsolved problem of the agentic stack. When AI agents do nothing but answer questions, auth is a conversation-level concern. When they read emails, update CRMs, write to databases, and call external APIs autonomously, auth becomes infrastructure — and the blast radius of getting it...

WorkOS Releases auth.md: An Open Agent Registration Protocol Built on OAuth Standards

For years , authentication on the web followed one design assumption: a human sits behind a browser. Click a button. Fill out a form. Verify an email. Copy an API key and paste it somewhere else. That model does not work when the user is delegating work to an agent. Agents are already writing code, opening pull requests, triaging tickets, querying systems, and updating records. But most products still have no real way for an agent to register. The workaround — giving an agent a raw API key or session token — produces credentials that are unscoped, hard to audit per session, and impossible to revoke selectively. WorkOS is proposing a structured alternative: auth.md , an open protocol for agent registration. What is auth.md? auth.md is a small Markdown file an application publishes at a well-known location — typically https://service.com/auth.md . The file tells agents how to register with that service: which flows are supported, which scopes exist, and how credentials are i...

Build a Complete Langfuse Observability and Evaluation Pipeline for Tracing, Prompt Management, Scoring, and Experiments

Image
In this tutorial, we implement the Langfuse (an open-source LLM engineering platform) pipeline for tracing, prompt management, scoring, datasets, and experiments. We build a complete workflow that works with either a real OpenAI key or a deterministic mock LLM, so we can understand every major Langfuse feature without depending on paid model access. We start by setting up credentials and connecting to Langfuse. We trace simple function calls, instrument a small RAG pipeline, manage prompts centrally, attach evaluation scores, and run dataset-based experiments. Also, we see how Langfuse helps us observe, evaluate, and improve LLM applications in a structured and production-ready way. Copy Code Copied Use a different Browser import subprocess, sys def pip_install(*pkgs): subprocess.run([sys.executable, "-m", "pip", "install", "-qU", *pkgs], check=True) pip_install("langfuse", "openai") import os from getpass import getpa...

StepFun Releases StepAudio 2.5 Realtime: An End-to-End Voice Model with Roleplay-Specific RLHF and Paralinguistic Comprehension

Image
StepFun, the Shanghai-based AI lab, released StepAudio 2.5 Realtime. It is an end-to-end real-time speech large language model with fully customizable persona capabilities. StepAudio 2.5 Realtime is a voice model that operates in real time. Unlike pipeline-based systems that separate speech recognition, reasoning, and synthesis into sequential steps, this is an end-to-end model. Audio goes in and audio comes out through a single unified system. The model supports Chinese and English. It connects via a WebSocket API. The endpoint is wss://api.stepfun.com/v1/realtime using the model string step-2.5-realtime . The Three Technical Pillars StepFun research team describes three core architectural innovations behind the model: 1. Million-Scale Persona Data Augmentation Starting from 10,000+ high-quality natively authored personas, StepFun applied algorithmic augmentation to build a million-scale persona feature matrix. This was combined with millions of real-world co...