Real-Time AI
Interpretation Lab

Powered by Google Gemini 2.5 Flash & Gemini Live API. Bridging language barriers with low-latency audio processing.

React TypeScript Tailwind Gemini License
App Interface

Overview

EchoLingua AI is a sophisticated web application engineered for real-time simultaneous interpretation. By leveraging the low-latency capabilities of Google's Gemini Live API, the application bridges language barriers instantly while offering a dedicated writing lab for granular text critique.

The user experience is built upon a "Thumb UI" philosophy, anchoring critical controls to the bottom of the viewport for optimal one-handed mobile interaction.

Core Features

Dual-Voice Interpreter

  • Simultaneous Live API Interpretation
  • Bi-Directional Flow (No toggle needed)
  • Raw PCM Audio Processing (16kHz/24kHz)

Writing & Pronunciation Lab

  • Schema-Enforced Granular Analysis
  • IPA Transcriptions & Error Logic
  • Neural Text-to-Speech Playback

Technical Stack

Category Technology Details
Frontend React 19 Built with TypeScript for type safety.
Styling Tailwind CSS Utility-first styling framework.
SDK Google GenAI @google/genai integration.
Audio Web Audio API AudioContext & ScriptProcessorNode for PCM stream.

Architecture Pipeline

Input Processing

Microphone data is captured via getUserMedia, downsampled to 16kHz, and converted into raw PCM 16-bit integer format. This stream is transmitted over WebSocket.

Output Rendering

The model returns base64-encoded PCM data. The frontend decodes this into a Float32Array and schedules playback via AudioBufferSourceNode for gapless audio.

audio-pipeline.ts
const processAudio = (stream) => {
  // Downsample to 16kHz
  const pcmData = convertToPCM(stream);
  
  // Stream via WebSocket
  socket.send(pcmData);

  // Handle Response
  return decodeBase64(response.audio);
};

Installation

terminal
# Clone the repository
git clone https://github.com/dovvnloading/EchoLingua.git
cd echolingua-ai

# Install dependencies
npm install

# Start development server
npm start

Environment Configuration

Create a .env file in the root directory:

API_KEY=your_google_genai_api_key

Usage Guide

Interpreter Mode

  1. Navigate to the Interpreter tab.
  2. Designate two active languages.
  3. Activate the Microphone to initialize WebSocket.
  4. Speak freely; system auto-detects language.
  5. Deactivate microphone to end session.

Writing Lab

  1. Navigate to the Writing Lab tab.
  2. Select target language from dropdown.
  3. Input text into the drafting area.
  4. Select Review for Gemini Flash analysis.
  5. Click Speaker icon for Neural TTS.