Voice AI Gazette - Issue #2: Key Updates in Voice AI from the Past 24 Hours

The past day has seen significant momentum in voice AI, driven by CES 2026 announcements and a major open-source release. Here’s a summary of the most notable developments:

NVIDIA Launches Nemotron Speech ASR

NVIDIA released a new open-source speech-to-text (STT/ASR) model optimized for low-latency applications like real-time voice agents.

Key Features: Cache-aware streaming architecture for stable sub-100ms latency (median 24ms time-to-first-token) and up to 3x higher throughput on GPUs.
Performance: Demos show voice-to-voice agents achieving under 500ms end-to-end latency using NVIDIA’s open models (including Nemotron 3 Nano LLM and a preview of Magpie TTS).
Availability: Fully open-source, including weights, data, and code—a massive push toward accessible natural conversational AI.

Google TV Integrates Gemini Voice Controls

Google expanded Gemini on Google TV (starting with TCL models), enhancing the living room experience:

Generative Features: Voice-controlled image/video generation (using Nano Banana and Veo models).
Personalization: Photo remixing from Google Photos and richer AI responses for recommendations or general queries.

BMW Announces Alexa+ for 2026 iX3

Amazon’s generative AI-powered Alexa+ will debut in the 2026 BMW iX3:

This marks the first vehicle with this upgraded voice assistant, promising far more natural, context-aware in-car interactions than previous generations.

Industry Context

Other recent context includes ongoing hype around voice as the “interface of the future,” with OpenAI reportedly prepping advanced audio models for early 2026. While no major new announcements dropped from players like OpenAI or ElevenLabs in the exact last 24-hour window, CES is significantly amplifying voice tech trends.

Voice AI is rapidly advancing toward seamless, low-latency, multimodal experiences—2026 looks poised for widespread adoption in devices, cars, and agents.