Posts

A Coding Implementation on Document Parsing Benchmarking with LlamaIndex ParseBench Using Python, Hugging Face, and Evaluation Metrics

In this tutorial, we explore how to use the ParseBench dataset to evaluate document parsing systems in a structured, practical way. We begin by loading the dataset directly from Hugging Face, inspecting its multiple dimensions, such as text, tables, charts, and layout, and transforming it into a unified dataframe for deeper analysis. As we progress, we identify key fields, detect linked PDFs, and build a lightweight baseline using PyMuPDF to extract and compare text. Throughout the process, we focus on creating a flexible pipeline that allows us to understand the dataset schema, evaluate parsing quality, and prepare inputs for more advanced OCR or vision-language models. Copy Code Copied Use a different Browser !pip install -q -U datasets huggingface_hub pandas matplotlib rich pymupdf rapidfuzz tqdm import json, re, textwrap, random, math from pathlib import Path from collections import Counter import pandas as pd import matplotlib.pyplot as plt from tqdm.auto import tqdm...

Poolside AI Introduces Laguna XS.2 and M.1: Agentic Coding Models Reaching 68.2% and 72.5% on SWE-bench Verified

Image
Poolside AI released the first two models in its Laguna family: Laguna M.1 and Laguna XS.2 . Alongside these, the company is releasing pool — a lightweight terminal-based coding agent and a dual Agent Client Protocol (ACP) client-server — the same environment Poolside uses internally for agent RL training and evaluation, now available as a research preview. What are These Models, and Why Should You Care? Both Laguna M.1 and Laguna XS.2 are Mixture-of-Experts (MoE) models. Instead of activating all parameters for every token, MoE models route each token through only a subset of specialized sub-networks called ‘experts.’ This means a large total parameter count and the capability headroom that comes with it while only paying the compute cost of a much smaller “activated” parameter count at inference time. Laguna M.1 is a 225B total parameter MoE model with 23B activated parameters, trained from scratch on 30T tokens using 6,144 interconnected NVIDIA Hopper GPUs. It completed pre-t...

How to Build Traceable and Evaluated LLM Workflows Using Promptflow, Prompty, and OpenAI

In this tutorial, we build a complete, production-style LLM workflow using Promptflow within a Colab environment. We begin by setting up a reliable keyring backend to avoid OS dependency issues and securely configure our OpenAI connection. From there, we establish a clean workspace and define a structured Prompty file that acts as the core LLM component of our pipeline. We then design a class-based flex flow that combines deterministic preprocessing with LLM reasoning, allowing us to inject computed hints into model responses. We also enable tracing to monitor each execution step, run both single- and batch-queries, and generate outputs in a structured format. Finally, we extend the system with an evaluation pipeline that leverages an LLM-as-a-judge to score responses against expected answers. Copy Code Copied Use a different Browser !pip install -q keyrings.alt import keyring from keyrings.alt.file import PlaintextKeyring keyring.set_keyring(PlaintextKeyring()) import ...

OpenAI Releases Privacy Filter: A 1.5B-Parameter Open-Source PII Redaction Model with 50M Active Parameters

OpenAI just quietly dropped something worth paying close attention to. Released on Hugging Face under an Apache 2.0 license, Privacy Filter is an open, bidirectional token-classification model purpose-built for detecting and redacting personally identifiable information (PII) in text. It is small enough to run in a web browser or on a laptop and fast enough for high-throughput data sanitization pipelines. What It Does Privacy Filter is a Named Entity Recognition (NER) model but one tuned specifically for the privacy use case. It detects eight categories of sensitive spans: account_number , private_address , private_email , private_person , private_phone , private_url , private_date , and secret . The secret category covers credential formats, project-specific token patterns, and high-entropy strings — the model card explicitly calls out missed detection of ‘novel credential formats’ and ‘secrets split across surrounding syntax’ as known failure modes, which signals what the categor...

Top 10 Physical AI Models Powering Real-World Robots in 2026

Top 10 Physical AI Models NVIDIA Isaac GR00T N-Series (N1.5 / N1.6 / N1.7) Google DeepMind Gemini Robotics 1.5 Physical Intelligence π0 / π0.5 / π0.7 Figure AI Helix OpenVLA Octo AGIBOT BFM and GCFM Gemini Robotics On-Device NVIDIA Cosmos World Foundation Models SmolVLA (HuggingFace LeRobot) The gap between language model capabilities and robotic deployment has been narrowing considerably over the past 18 months. A new class of foundation models — purpose-built not for text generation but for physical action — is now running on real hardware across factories, warehouses, and research labs. These systems span deployed robot policies, private-preview VLAs, open-weight research models, and world models used to scale robot training data. Some are being evaluated or deployed with industrial partners; others are primarily research or developer-facing systems. Here is a breakdown of the ten that matter most in 2026. NVIDIA Isaac GR00T N-Series (N1.5 / N1.6 / N1.7) NVIDIA rele...

How to Build a Lightweight Vision-Language-Action-Inspired Embodied Agent with Latent World Modeling and Model Predictive Control

In this tutorial, we build an embodied simulation vision agent that learns to perceive, plan, predict, and replan directly from pixel observations. We create a fully NumPy-rendered grid world in which the agent observes RGB frames rather than symbolic state variables, enabling us to simulate a simplified Vision-Language-Action-style pipeline. We train a lightweight world model that encodes visual input into a latent representation, predicts future states conditioned on actions and goals, and reconstructs the next frame. Using model predictive control in latent space, we enable the agent to sample possible action sequences, evaluate predicted outcomes, and execute the best action in a closed loop. Copy Code Copied Use a different Browser import random, numpy as np, torch, torch.nn as nn, torch.nn.functional as F import matplotlib.pyplot as plt from dataclasses import dataclass from typing import Tuple, Dict, List from torch.utils.data import Dataset, DataLoader try: from...