Claude Code Plugins

Community-maintained marketplace

Feedback

Use when working with Quetrex's voice interface, OpenAI Realtime API, WebRTC, or echo cancellation. Knows Quetrex's specific voice architecture decisions and patterns. CRITICAL - prevents breaking working voice system.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name voice-system-expert
description Use when working with Quetrex's voice interface, OpenAI Realtime API, WebRTC, or echo cancellation. Knows Quetrex's specific voice architecture decisions and patterns. CRITICAL - prevents breaking working voice system.
allowed-tools Read

Quetrex Voice System Expert

CRITICAL: Read This First

Quetrex's voice system architecture is extensively documented and battle-tested. Before making ANY changes to voice-related code, you MUST read:

  1. ADR-001-VOICE-ECHO-CANCELLATION.md (definitive architectural decision)
  2. docs/architecture/VOICE-SYSTEM.md (technical implementation)
  3. docs/features/voice-interface.md (user-facing features)

Location: src/lib/openai-realtime.ts

Core Architecture Decision

ALWAYS-ON MICROPHONE + BROWSER AEC

This is Decision 4 from VOICE-SYSTEM.md and the definitive approach documented in ADR-001.

// ✅ CORRECT: Always-on microphone
const mediaStream = await navigator.mediaDevices.getUserMedia({
  audio: {
    echoCancellation: true,  // CRITICAL - browser handles echo cancellation
    noiseSuppression: true,
    autoGainControl: true,
  },
})

// Microphone track stays ENABLED throughout conversation
// Browser's native AEC prevents feedback loops
// Server-side VAD (Voice Activity Detection) handles turn detection

How It Works

Audio Pipeline

User speaks
    ↓
Microphone (always enabled, echoCancellation: true)
    ↓
WebRTC → OpenAI Realtime API
    ↓
Server-side VAD detects speech
    ↓
OpenAI processes and responds
    ↓
Audio response via WebRTC
    ↓
HTMLAudioElement playback (stays in browser pipeline)
    ↓
Browser AEC compares mic input + speaker output
    ↓
Echo automatically canceled (no feedback loop)

Why This Works

Browser Echo Cancellation Requirements:

  1. Both microphone input AND speaker output must be in browser's audio graph
  2. Audio must flow through browser's WebRTC stack
  3. No manual intervention needed (browser handles it)

This is the industry standard:

  • ChatGPT voice mode
  • Google Meet
  • Zoom
  • Discord
  • Microsoft Teams

They ALL use always-on microphone + browser AEC.

DO NOT DO THESE THINGS

❌ DON'T: Toggle Microphone Track

// ❌ WRONG - This breaks echo cancellation
async function pauseRecording() {
  microphone.enabled = false // DON'T DO THIS
}

async function resumeRecording() {
  microphone.enabled = true // DON'T DO THIS
}

Why this fails:

  • Not industry standard
  • Causes audio glitches
  • Can break echo cancellation
  • Adds artificial delays
  • More complex state management
  • No real benefit

❌ DON'T: Route Audio Outside Browser

// ❌ WRONG - AudioWorklet bypass to native audio
const workletNode = new AudioWorkletNode(audioContext, 'bypass-processor')
workletNode.port.postMessage({ cmd: 'route-to-native' })

Why this fails:

  • Breaks browser AEC (audio leaves browser pipeline)
  • Causes echo/feedback loops
  • Requires complex native audio handling
  • Platform-specific implementations
  • Already tried and failed (see abandoned-approaches/)

❌ DON'T: Implement Custom Echo Cancellation

// ❌ WRONG - Custom AEC implementation
class CustomEchoCanceller {
  cancelEcho(input: AudioBuffer, output: AudioBuffer) {
    // Complex DSP code...
  }
}

Why this fails:

  • Reinventing the wheel
  • Browser AEC is battle-tested by billions of users
  • Extremely complex to implement correctly
  • Requires deep DSP knowledge
  • Performance issues
  • Platform-specific tuning needed

❌ DON'T: Add Artificial Delays

// ❌ WRONG - Delays for echo prevention
await new Promise(resolve => setTimeout(resolve, 500))
await playAudio()
await new Promise(resolve => setTimeout(resolve, 500))
resumeRecording()

Why this fails:

  • Not necessary (browser AEC handles it)
  • Degrades user experience
  • Adds latency
  • Still doesn't prevent echo if implementation is wrong

What You CAN Change

✅ DO: Adjust Audio Settings

// ✅ Can tweak these settings
const constraints = {
  audio: {
    echoCancellation: true,      // MUST be true
    noiseSuppression: true,       // Can adjust
    autoGainControl: true,        // Can adjust
    sampleRate: 24000,            // Can change for quality
    channelCount: 1,              // Mono is fine for voice
  },
}

✅ DO: Handle Connection States

// ✅ Can manage connection lifecycle
async function connectVoice() {
  // Setup WebRTC connection
  // Start streaming
  // Handle connection events
}

async function disconnectVoice() {
  // Clean up WebRTC connection
  // Stop streaming
  // Release microphone
}

✅ DO: Add UI Feedback

// ✅ Can add visual indicators
function onVoiceActivity(active: boolean) {
  if (active) {
    // Show "listening" indicator
    // Animate microphone icon
  } else {
    // Show "idle" indicator
  }
}

✅ DO: Handle Errors

// ✅ Can improve error handling
try {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true })
} catch (error) {
  if (error.name === 'NotAllowedError') {
    // Show permission request UI
  } else if (error.name === 'NotFoundError') {
    // Show "no microphone found" error
  }
}

Implementation Location

Primary file: src/lib/openai-realtime.ts

Key functions:

  • setupMediaStream() - Initializes microphone with correct constraints
  • connectToOpenAI() - Establishes WebRTC connection
  • handleAudioResponse() - Plays AI responses via HTMLAudioElement

DO NOT modify:

  • Microphone enable/disable logic (should stay always-on)
  • Echo cancellation settings (must be true)
  • Audio routing (must stay in browser pipeline)

CAN modify:

  • UI feedback and visual indicators
  • Error handling and user messages
  • Connection retry logic
  • Audio quality settings (sample rate, etc.)

Why This Architecture Was Chosen

From ADR-001:

Option A: Trust Industry Pattern (CHOSEN)

  • ✅ Used by ChatGPT, Google Meet, Zoom, Discord
  • ✅ Browser vendors optimize for this pattern
  • ✅ Works across all platforms (macOS, Windows, Linux, mobile)
  • ✅ No custom implementation needed
  • ✅ Proven reliability

Options Rejected:

  • ❌ Manual microphone toggling (not industry standard)
  • ❌ AudioWorklet native bypass (breaks AEC)
  • ❌ Custom echo cancellation (reinventing wheel)
  • ❌ Native Rust WebRTC (2-4 weeks work for no benefit)

Platform Context: Web Application

IMPORTANT: Quetrex is now a pure web application, not a Tauri desktop app.

Why this matters for voice:

  • Browser echo cancellation works perfectly in all browsers
  • No WKWebView limitations (that was the Tauri problem)
  • Universal compatibility (Chrome, Safari, Firefox, Edge)
  • No platform-specific audio handling needed
  • Just works™️

The WKWebView Problem (Historical): Quetrex was originally a Tauri desktop app. On macOS, Tauri uses WKWebView, which has a bug: WebRTC audio playback doesn't work. This forced us to try workarounds (AudioWorklet bypass, native audio routing), which all broke echo cancellation.

Solution: Convert to web application. Now echo cancellation works perfectly everywhere.

Testing Voice Changes

If you must modify voice code:

  1. Test on real browsers (not just dev tools)
  2. Test the echo scenario:
    • Turn up speaker volume
    • Start voice conversation
    • Verify no feedback loop/echo
  3. Test across platforms:
    • Chrome (most common)
    • Safari (macOS, iOS)
    • Firefox
    • Edge
  4. Test edge cases:
    • Poor network connection
    • Microphone permission denied
    • Mid-conversation disconnect

When to Consult Documentation

Before any voice changes, read:

  1. docs/decisions/ADR-001-VOICE-ECHO-CANCELLATION.md
  2. docs/architecture/VOICE-SYSTEM.md
  3. docs/development/abandoned-approaches/

If you see mentions of:

  • Tauri → Ignore (Quetrex is web app now)
  • WKWebView → Ignore (not relevant anymore)
  • AudioWorklet bypass → Don't implement (already tried, failed)
  • Manual mic toggling → Don't implement (not industry standard)

Summary

The Golden Rule: Trust browser echo cancellation. Always-on microphone + echoCancellation: true + audio in browser pipeline = Perfect echo cancellation.

DO:

  • Keep microphone enabled throughout conversation
  • Use echoCancellation: true
  • Keep audio in browser (HTMLAudioElement)
  • Let OpenAI Realtime API handle VAD
  • Trust the industry pattern

DON'T:

  • Toggle microphone track
  • Route audio outside browser
  • Implement custom echo cancellation
  • Add artificial delays
  • Try to "improve" what already works

If in doubt: Read ADR-001 and VOICE-SYSTEM.md. The architecture is thoroughly documented for a reason.