Back to Projects

NVIDIA Jetson Voice Agent

Voice-activated local AI assistant built for NVIDIA Jetson Nano using Dual-Prompt orchestration to reduce tool hallucinations while powering bilingual home automation.

NVIDIA Jetson Voice Agent

A professional-grade, voice-activated AI companion designed to run locally on the NVIDIA Jetson Nano. This project implements a sophisticated Dual-Prompt architecture to solve common LLM tool-calling hallucination issues, enabling reliable home automation control and multilingual conversation through a seamless Speech-to-Text (STT) and Text-to-Speech (TTS) pipeline.

Features

  • Dual-Prompt Intent Classification: Uses a dedicated Tool Selector system prompt to eliminate tool-use hallucinations and ensure strict adherence to available functions.
  • Voice-First Interface: Hands-free interaction using a speaker-microphone setup.
  • Multilingual Support: Automatic language detection for English and Mandarin Chinese, with localized voice synthesis.
  • Local Inference: All processing (LLM, STT, TTS) happens on-device for maximum privacy and low latency.
  • Knowledge Base Injection: Dynamically appends real-time sensor/device data into the chat context.
  • Contextual Memory: Maintains a rolling conversation history for natural, chatty interactions.

System Architecture

The system operates via a continuous orchestration loop:

  1. Capture: PyAudio records high-fidelity audio via a USB interface.
  2. Transcription: Whisper.cpp converts audio to text.
  3. Phase 1 - Intent Selection: A strict system prompt determines if a tool (e.g., temperature, lights) is required.
  4. Tool Execution: If required, Python functions fetch real-world data and wrap it in a KNOWLEDGE BASE tag.
  5. Phase 2 - Response Generation: A friendly Companion prompt processes the input plus retrieved data to form a natural response.
  6. Synthesis: Piper TTS generates audio in the detected language.
  7. Playback: aplay outputs the response through the speaker.

Hardware Requirements

  • Compute: NVIDIA Jetson Nano (4GB / Developer Kit recommended)
  • Audio: USB speaker-microphone combo
  • Storage: High-speed microSD card (64GB+) or external SSD

Software Prerequisites

Ensure the following tools are installed and paths are correctly mapped:

  • Ollama for local model hosting
  • Whisper.cpp for transcription
  • Piper TTS for neural voice synthesis
  • FFmpeg for audio resampling
  • Python dependencies for orchestrating audio, prompts, and tool execution

Why This Project

I built this voice agent to explore how local AI companion systems can stay reliable and private without cloud dependencies. The dual-prompt design separates intent classification from response generation, eliminating the common tool hallucination failure mode and enabling the assistant to remain friendly while only using accurate, verified tool outputs.

Future Roadmap

  • Wake-word detection with Porcupine or Snowboy
  • Home Assistant integration for real IoT device control
  • Sliding window memory trimming to prevent context overflow
  • Expand support for additional local models and languages