Posts

Baidu Releases Unlimited OCR, a 3B Model That Keeps the KV Cache Flat for Long-Document Parsing

Image
Most end-to-end OCR models slow down as output grows. Each generated token adds to the KV cache. Memory rises and generation drags. Parsing dozens of pages becomes impractical. Baidu’s Unlimited OCR addresses this directly. It swaps the decoder’s attention for a design that keeps memory constant. TL;DR Unlimited OCR is a 3B-parameter Mixture-of-Experts model, with only 500M parameters active. It replaces decoder attention with Reference Sliding Window Attention (R-SWA), keeping the KV cache constant. The model parses dozens of pages in one forward pass under a 32K maximum length. It scores 93.23 on OmniDocBench v1.5, beating the DeepSeek OCR baseline by 6.22 points. It builds on DeepSeek OCR via continue-training, not a from-scratch run. What is Unlimited OCR? Unlimited OCR takes DeepSeek OCR as its baseline. It keeps the DeepEncoder and the Mixture-of-Experts decoder. The MoE design holds 3B total parameters but activates only 500M at infe...

Nous Research Adds /learn to Hermes Agent’s Skills System, Capturing Workflows as Slash Commands Without Hand-Writing SKILL.md

Image
Nous Research has expanded the Skills System inside Hermes Agent, its open-source self-improving agent. The new addition is /learn , a command that writes a reusable skill for you . Point it at a document page, a local SDK, a past conversation, or pasted notes. The live agent gathers the material, then authors a SKILL.md on your behalf. Hermes Agent can now /learn from anything: feed it directories of any source material (code, API docs, manuals, PDFs, configs) and it distills a verifiable reusable skill pic.twitter.com/oRznwCRF3E — Nous Research (@NousResearch) June 23, 2026 Hermes Skills System Skills are on-demand knowledge documents the agent loads when needed. Each one is a folder containing a SKILL.md file with instructions. They follow a progressive disclosure pattern to keep token usage low. The format is compatible with the agentskills.io open standard. All skills live in ~/.hermes/skills/ , the single source of truth. On a fresh install, bundled sk...

16 Best Generative AI Coding Tools in 2026 Compared: Features, and Best Fit

Generative AI has reshaped how software gets built. What began as line-by-line autocomplete now spans full application generation, multi-agent build pipelines, and natural-language interfaces to entire codebases. Large language models trained on code can read context, follow intent, and produce working frontends, backends, and infrastructure with little manual setup. For early-level AI engineers, software engineers, and data scientists, the practical question is no longer whether these tools help, but which ones fit a given task. Some accelerate writing and reviewing code inside an existing workflow. Others remove the editor entirely and build deployable products from a prompt. Here are the top generative AI tools in code generation and coding to know in 2026 : 1. Atoms * Atoms * (10% discount coupon: MARKTECHPOST10 ) is an AI platform that turns natural-language descriptions into fully deployable applications. It marks a clear step beyond standalone code generators by...

DFlash Speculative Decoding Drafts Whole Token Blocks in Parallel for Up to 15x Higher Throughput on NVIDIA Blackwell

Image
Autoregressive large language models generate text one token at a time. Each token waits for the one before it. This serial loop leaves modern GPUs underused and keeps inference slow. The cost grows worse with long Chain-of-Thought reasoning models. Their lengthy outputs make latency the dominant part of generation. Speculative decoding is the standard fix. A small draft model proposes future tokens. The large target model verifies those tokens in parallel. Accepted tokens are kept, so the output stays lossless. But most methods, including the state-of-the-art EAGLE-3, still draft autoregressively. That serial drafting caps real-world speedups near 2–3×. DFlash , introduced by research team from UC San Diego team (z-lab), takes a different route. It is a lightweight block diffusion model built for drafting. Instead of drafting tokens one at a time, it proposes a whole block in a single forward pass. The target model then verifies that block in parallel. The research team r...