Baidu Releases Unlimited OCR, a 3B Model That Keeps the KV Cache Flat for Long-Document Parsing
Most end-to-end OCR models slow down as output grows. Each generated token adds to the KV cache. Memory rises and generation drags. Parsing dozens of pages becomes impractical. Baidu’s Unlimited OCR addresses this directly. It swaps the decoder’s attention for a design that keeps memory constant. TL;DR Unlimited OCR is a 3B-parameter Mixture-of-Experts model, with only 500M parameters active. It replaces decoder attention with Reference Sliding Window Attention (R-SWA), keeping the KV cache constant. The model parses dozens of pages in one forward pass under a 32K maximum length. It scores 93.23 on OmniDocBench v1.5, beating the DeepSeek OCR baseline by 6.22 points. It builds on DeepSeek OCR via continue-training, not a from-scratch run. What is Unlimited OCR? Unlimited OCR takes DeepSeek OCR as its baseline. It keeps the DeepEncoder and the Mixture-of-Experts decoder. The MoE design holds 3B total parameters but activates only 500M at infe...
