Posts

Meet ‘Kani-TTS-2’: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support

The landscape of generative audio is shifting toward efficiency. A new open-source contender, Kani-TTS-2 , has been released by the team at nineninesix .ai. This model marks a departure from heavy, compute-expensive TTS systems. Instead, it treats audio as a language, delivering high-fidelity speech synthesis with a remarkably small footprint. Kani-TTS-2 offers a lean, high-performance alternative to closed-source APIs. It is currently available on Hugging Face in both English ( EN ) and Portuguese ( PT ) versions. The Architecture: LFM2 and NanoCodec Kani-TTS-2 follows the ‘Audio-as-Language ‘ philosophy. The model does not use traditional mel-spectrogram pipelines. Instead, it converts raw audio into discrete tokens using a neural codec. The system relies on a two-stage process: The Language Backbone: The model is built on LiquidAI’s LFM2 (350M) architecture. This backbone generates ‘audio intent’ by predicting the next audio tokens. Because LFM (Liquid Foundation Models) a...

Getting Started with OpenClaw and Connecting It with WhatsApp

Image
OpenClaw is a self-hosted personal AI assistant that runs on your own devices and communicates through the apps you already use—such as WhatsApp, Telegram, Slack, Discord, and more. It can answer questions, automate tasks, interact with your files and services, and even speak or listen on supported devices, all while keeping you in control of your data. Rather than being just another chatbot, OpenClaw acts as a true personal assistant that fits into your daily workflow. In just a few months, this open-source project has surged in popularity, crossing 150,000+ stars on GitHub. In this article, we’ll walk through how to get started with OpenClaw and connect it to WhatsApp. What can OpenClaw do? OpenClaw is built to fit seamlessly into your existing digital life. It connects with 50+ integrations , letting you chat with your assistant from apps like WhatsApp, Telegram, Slack, or Discord, while controlling and automating tasks from your desktop. You can use cloud or local AI models of y...

Google AI Introduces the WebMCP to Enable Direct and Structured Website Interactions for New AI Agents

Google is officially turning Chrome into a playground for AI agents. For years, AI ‘browsers’ have relied on a messy process: taking screenshots of websites, running them through vision models, and guessing where to click. This method is slow, breaks easily, and consumes massive amounts of compute. Google has introduced a better way: the Web Model Context Protocol (WebMCP) . Announced alongside the Early Preview Program (EPP) , this protocol allows websites to communicate directly to AI models. Instead of the AI ‘guessing’ how to use a site, the site tells the AI exactly what tools are available. The End of Screen Scraping Current AI agents treat the web like a picture. They ‘look’ at the UI and try to find the ‘Submit’ button. If the button moves 5 pixels, the agent might fail. WebMCP replaces this guesswork with structured data. It turns a website into a set of capabilities . For developers, this means you no longer have to worry about an AI breaking your frontend. You simply def...

How to Build a Self-Organizing Agent Memory System for Long-Term AI Reasoning 

In this tutorial, we build a self-organizing memory system for an agent that goes beyond storing raw conversation history and instead structures interactions into persistent, meaningful knowledge units. We design the system so that reasoning and memory management are clearly separated, allowing a dedicated component to extract, compress, and organize information. At the same time, the main agent focuses on responding to the user. We use structured storage with SQLite, scene-based grouping, and summary consolidation, and we show how an agent can maintain useful context over long horizons without relying on opaque vector-only retrieval. Copy Code Copied Use a different Browser import sqlite3 import json import re from datetime import datetime from typing import List, Dict from getpass import getpass from openai import OpenAI OPENAI_API_KEY = getpass("Enter your OpenAI API key: ").strip() client = OpenAI(api_key=OPENAI_API_KEY) def llm(prompt, temperature=0.1, max...

Google DeepMind Introduces Aletheia: The AI Agent Moving from Math Competitions to Fully Autonomous Professional Research Discoveries

Image
Google DeepMind team has introduced Aletheia , a specialized AI agent designed to bridge the gap between competition-level math and professional research. While models achieved gold-medal standards at the 2025 International Mathematical Olympiad (IMO), research requires navigating vast literature and constructing long-horizon proofs. Aletheia solves this by iteratively generating, verifying, and revising solutions in natural language. https://ift.tt/9P6uCKp The Architecture: Agentic Loop Aletheia is powered by an advanced version of Gemini Deep Think . It utilizes a three-part ‘agentic harness’ to improve reliability : Generator: Proposes a candidate solution for a research problem. Verifier: An informal natural language mechanism that checks for flaws or hallucinations. Reviser: Corrects errors identified by the Verifier until a final output is approved. This separation of duties is critical; researchers observed that explicitly separating verification helps the model ...

How to Align Large Language Models with Human Preferences Using Direct Preference Optimization, QLoRA, and Ultra-Feedback

In this tutorial, we implement an end-to-end Direct Preference Optimization workflow to align a large language model with human preferences without using a reward model. We combine TRL’s DPOTrainer with QLoRA and PEFT to make preference-based alignment feasible on a single Colab GPU. We train directly on the UltraFeedback binarized dataset, where each prompt has a chosen and a rejected response, allowing us to shape model behavior and style rather than just factual recall. Copy Code Copied Use a different Browser import os import math import random import torch !pip -q install -U "transformers>=4.45.0" "datasets>=2.19.0" "accelerate>=0.33.0" "trl>=0.27.0" "peft>=0.12.0" "bitsandbytes>=0.43.0" "sentencepiece" "evaluate" SEED = 42 random.seed(SEED) torch.manual_seed(SEED) torch.cuda.manual_seed_all(SEED) MODEL_NAME = os.environ.get("MODEL_NAME", "Qwen/Qwen2-0....

OpenAI Releases a Research Preview of GPT‑5.3-Codex-Spark: A 15x Faster AI Coding Model Delivering Over 1000 Tokens Per Second on Cerebras Hardware

Image
OpenAI just launched a new research preview called GPT-5.3 Codex-Spark . This model is built for 1 thing: extreme speed. While the standard GPT-5.3 Codex focuses on deep reasoning, Spark is designed for near-instant response times. It is the result of a deep hardware-software integration between OpenAI and Cerebras. The results are game-changing. Spark is 15x faster than the flagship GPT-5.3 Codex. It consistently delivers over 1000 tokens per second . This speed effectively removes the delay between a developer’s thought and the model’s code output. The Hardware: Wafer-Scale Engineering The massive performance jump is powered by the Cerebras Wafer-Scale Engine 3 (WSE-3) . Traditional AI models run on clusters of small GPUs. These GPUs must communicate to each other over cables, which creates a ‘bottleneck.’ This bottleneck slows down the speed of the model. The WSE-3 is different. It is a single, giant chip the size of a whole silicon wafer. Because the entire model lives on 1 p...