Meet ‘Kani-TTS-2’: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support
The landscape of generative audio is shifting toward efficiency. A new open-source contender, Kani-TTS-2 , has been released by the team at nineninesix .ai. This model marks a departure from heavy, compute-expensive TTS systems. Instead, it treats audio as a language, delivering high-fidelity speech synthesis with a remarkably small footprint. Kani-TTS-2 offers a lean, high-performance alternative to closed-source APIs. It is currently available on Hugging Face in both English ( EN ) and Portuguese ( PT ) versions. The Architecture: LFM2 and NanoCodec Kani-TTS-2 follows the ‘Audio-as-Language ‘ philosophy. The model does not use traditional mel-spectrogram pipelines. Instead, it converts raw audio into discrete tokens using a neural codec. The system relies on a two-stage process: The Language Backbone: The model is built on LiquidAI’s LFM2 (350M) architecture. This backbone generates ‘audio intent’ by predicting the next audio tokens. Because LFM (Liquid Foundation Models) a...
