Posts

Sakana AI Introduces Doc-to-LoRA and Text-to-LoRA: Hypernetworks that Instantly Internalize Long Contexts and Adapt LLMs via Zero-Shot Natural Language

Image
Customizing Large Language Models (LLMs) currently presents a significant engineering trade-off between the flexibility of In-Context Learning (ICL) and the efficiency of Context Distillation (CD) or Supervised Fine-Tuning (SFT) . Tokyo-based Sakana AI has proposed a new approach to bypass these constraints through cost amortization. In two of their recent papers, they introduced Text-to-LoRA (T2L) and Doc-to-LoRA (D2L) , lightweight hypernetworks that meta-learn to generate Low-Rank Adaptation (LoRA) matrices in a single forward pass. The Engineering Bottleneck: Latency vs. Memory For AI Devs, the primary limitation of standard LLM adaptation is computational overhead: In-Context Learning (ICL): While convenient, ICL suffers from quadratic attention costs and linear KV-cache growth, which increases latency and memory consumption as prompts lengthen. Context Distillation (CD): CD transfers information into model parameters, but per-prompt distillation is often impractical d...

Perplexity Just Released pplx-embed: New SOTA Qwen3 Bidirectional Embedding Models for Web-Scale Retrieval Tasks

Image
Perplexity has released pplx-embed , a collection of multilingual embedding models optimized for large-scale retrieval tasks. These models are designed to handle the noise and complexity of web-scale data, providing a production-ready alternative to proprietary embedding APIs. Architectural Innovations: Bidirectional Attention and Diffusion Most Large Language Models (LLMs) utilize causal, decoder-only architectures. However, for embedding tasks, understanding the full context of a sentence is more critical than predicting the next token. Perplexity research team addressed this by implementing bidirectional attention . This allows the model to process all tokens in a sequence simultaneously, resulting in a more comprehensive hidden state representation. Furthermore, the models utilize diffusion-based pretraining . While diffusion is frequently used in generative media, applying it to text embeddings helps the model learn to reconstruct clean semantic signals from noisy or fragmented...

Microsoft Research Introduces CORPGEN To Manage Multi Horizon Tasks For Autonomous AI Agents Using Hierarchical Planning and Memory

Image
Microsoft researchers have introduced CORPGEN , an architecture-agnostic framework designed to manage the complexities of realistic organizational work through autonomous digital employees. While existing benchmarks evaluate AI agents on isolated, single tasks, real-world corporate environments require managing dozens of concurrent, interleaved tasks with complex dependencies. The research team identifies this distinct problem class as Multi-Horizon Task Environments (MHTEs) . The Performance Gap in MHTEs Empirical testing reveals that baseline computer using agents (CUAs) experience significant performance degradation when moved from single-task scenarios to MHTEs. Using three independent CUA implementations, completion rates dropped from 16.7% at 25% load to 8.7% at 100% load. The research team identified four fundamental failure modes causing this decline : Context Saturation: Context requirements grow O(N) with task count rather than O(1) , rapidly exceeding the token window...

Nous Research Releases ‘Hermes Agent’ to Fix AI Forgetfulness with Multi-Level Memory and Dedicated Remote Terminal Access Support

In the current AI landscape, we’ve become accustomed to the ‘ephemeral agent’—a brilliant but forgetful assistant that restarts its cognitive clock with every new chat session. While LLMs have become master coders, they lack the persistent state required to function as true teammates. Nous Research team released Hermes Agent , an open-source autonomous system designed to solve the two biggest bottlenecks in agentic workflows: memory decay and environmental isolation. Built on the high-steerability Hermes-3 model family, Hermes Agent is billed as the assistant that ‘grows with you.’ The Memory Hierarchy: Learning via Skill Documents For an agent to ‘grow,’ it needs more than just a large context window. Hermes Agent utilizes a multi-level memory system that mimics procedural learning. While it handles short-term tasks through standard inference, its long-term utility is driven by Skill Documents . When Hermes Agent completes a complex task—such as debugging a specific microserv...

Tailscale and LM Studio Introduce ‘LM Link’ to Provide Encrypted Point-to-Point Access to Your Private GPU Hardware Assets

For the modern AI developer productivity is often tied to a physical location. You likely have a ‘Big Rig’ at home or the office—a workstation humming with NVIDIA RTX cards—and a ‘Travel Rig,’ a sleek laptop that’s perfect for coffee shops but struggles to run even a quantized Llama-3 variant. Until now, bridging that gap meant venturing into the ‘networking dark arts.’ You either wrestled with brittle SSH tunnels, exposed private APIs to the public internet, or paid for cloud GPUs while your own hardware sat idle. This week, LM Studio and Tailscale launched LM Link , a feature that treats your remote hardware as if it were plugged directly into your laptop. The Problem: API Key Sprawl and Public Exposure Running LLMs locally offers privacy and zero per-token costs, but mobility remains the bottleneck. Traditional remote access requires a public endpoint, which creates two massive headaches: Security Risk: Opening ports to the internet invites constant scanning and potential e...

How to Build an Elastic Vector Database with Consistent Hashing, Sharding, and Live Ring Visualization for RAG Systems

In this tutorial, we build an elastic vector database simulator that mirrors how modern RAG systems shard embeddings across distributed storage nodes. We implement consistent hashing with virtual nodes to ensure balanced placement and minimal reshuffling as the system scales. We visualize the hashing ring in real time and interactively add or remove nodes to observe how only a small fraction of embeddings move. We use this setup to connect infrastructure theory directly to practical behavior in distributed AI systems. Copy Code Copied Use a different Browser !pip -q install networkx ipywidgets import hashlib import bisect import random from dataclasses import dataclass from typing import Dict, List, Optional import numpy as np import networkx as nx import matplotlib.pyplot as plt from IPython.display import display, clear_output import ipywidgets as widgets We set up the execution environment and install the required libraries needed for visualization and interactivi...

New ETH Zurich Study Proves Your AI Coding Agents are Failing Because Your AGENTS.md Files are too Detailed

Image
In the high-stakes world of AI, ‘Context Engineering’ has emerged as the latest frontier for squeezing performance out of LLMs. Industry leaders have touted AGENTS.md (and its cousins like CLAUDE.md ) as the ultimate configuration point for coding agents—a repository-level ‘North Star’ injected into every conversation to guide the AI through complex codebases. But a recent study from researchers at ETH Zurich just dropped a massive reality check. The findings are quite clear: if you aren’t deliberate with your context files, you are likely sabotaging your agent’s performance while paying a 20% premium for the privilege. https://ift.tt/OHdZqua The Data: More Tokens, Less Success The ETH Zurich research team analyzed coding agents like Sonnet-4.5 , GPT-5.2 , and Qwen3-30B across established benchmarks and a novel set of real-world tasks called AGENTBENCH . The results were surprisingly lopsided: The Auto-Generated Tax : Automatically generated context files actually reduced ...