Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with Instant Voice Cloning

Neuphonic has released NeuTTS Air, an open-source text-to-speech (TTS) speech language model designed to run locally in real time on CPUs. The Hugging Face model card lists 748M parameters (Qwen2…

Thinking Machines Launches Tinker: A Low-Level Training API that Abstracts Distributed LLM Fine-Tuning without Hiding the Knobs

Thinking Machines has released Tinker, a Python API that lets researchers and engineers write training loops locally while the platform executes them on managed distributed GPU clusters. The pitch is…

How to Build an Advanced Voice AI Pipeline with WhisperX for Transcription, Alignment, Analysis, and Export?

In this tutorial, we walk through an advanced implementation of WhisperX, where we explore transcription, alignment, and word-level timestamps in detail. We set up the environment, load and preprocess the…

IBM Released new Granite 4.0 Models with a Novel Hybrid Mamba-2/Transformer Architecture: Drastically Reducing Memory Use without Sacrificing Performance

IBM just released Granite 4.0, an open-source LLM family that swaps monolithic Transformers for a hybrid Mamba-2/Transformer stack to cut serving memory while keeping quality. Sizes span a 3B dense…

ServiceNow AI Releases Apriel-1.5-15B-Thinker: An Open-Weights Multimodal Reasoning Model that Hits Frontier-Level Performance on a Single-GPU Budget

ServiceNow AI Research Lab has released Apriel-1.5-15B-Thinker, a 15-billion-parameter open-weights multimodal reasoning model trained with a data-centric mid-training recipe—continual pretraining followed by supervised fine-tuning—without reinforcement learning or preference optimization. The…

Liquid AI Released LFM2-Audio-1.5B: An End-to-End Audio Foundation Model with Sub-100 ms Response Latency

Liquid AI has released LFM2-Audio-1.5B, a compact audio–language foundation model that both understands and generates speech and text through a single end-to-end stack. It positions itself for low-latency, real-time assistants…

The Role of Model Context Protocol (MCP) in Generative AI Security and Red Teaming

Overview Model Context Protocol (MCP) is an open, JSON-RPC–based standard that formalizes how AI clients (assistants, IDEs, web apps) connect to servers exposing three primitives—tools, resources, and prompts—over defined transports…

MLPerf Inference v5.1 (2025): Results Explained for GPUs, CPUs, and AI Accelerators

What MLPerf Inference Actually Measures? MLPerf Inference quantifies how fast a complete system (hardware + runtime + serving stack) executes fixed, pre-trained models under strict latency and accuracy constraints. Results…

Google AI Proposes ReasoningBank: A Strategy-Level I Agent Memory Framework that Makes LLM Agents Self-Evolve at Test Time

How do you make an LLM agent actually learn from its own runs—successes and failures—without retraining? Google Research proposes ReasoningBank, an AI agent memory framework that converts an agent’s own…

How to Build an Advanced Agentic Retrieval-Augmented Generation (RAG) System with Dynamic Strategy and Smart Retrieval?

In this tutorial, we walk through the implementation of an Agentic Retrieval-Augmented Generation (RAG) system. We design it so that the agent does more than just retrieve documents; it actively…