EchoLingua AI | Real-Time Interpretation

Overview

EchoLingua AI is a sophisticated web application engineered for real-time simultaneous interpretation. By leveraging the low-latency capabilities of Google's Gemini Live API, the application bridges language barriers instantly while offering a dedicated writing lab for granular text critique.

The user experience is built upon a "Thumb UI" philosophy, anchoring critical controls to the bottom of the viewport for optimal one-handed mobile interaction.

Core Features

Dual-Voice Interpreter

▹ Simultaneous Live API Interpretation
▹ Bi-Directional Flow (No toggle needed)
▹ Raw PCM Audio Processing (16kHz/24kHz)

Writing & Pronunciation Lab

▹ Schema-Enforced Granular Analysis
▹ IPA Transcriptions & Error Logic
▹ Neural Text-to-Speech Playback

Technical Stack

Category	Technology	Details
Frontend	React 19	Built with TypeScript for type safety.
Styling	Tailwind CSS	Utility-first styling framework.
SDK	Google GenAI	@google/genai integration.
Audio	Web Audio API	AudioContext & ScriptProcessorNode for PCM stream.

Architecture Pipeline

Input Processing

Microphone data is captured via getUserMedia, downsampled to 16kHz, and converted into raw PCM 16-bit integer format. This stream is transmitted over WebSocket.

Output Rendering

The model returns base64-encoded PCM data. The frontend decodes this into a Float32Array and schedules playback via AudioBufferSourceNode for gapless audio.

audio-pipeline.ts

const processAudio = (stream) => {
  // Downsample to 16kHz
  const pcmData = convertToPCM(stream);
  
  // Stream via WebSocket
  socket.send(pcmData);

  // Handle Response
  return decodeBase64(response.audio);
};

Installation

terminal

# Clone the repository
git clone https://github.com/dovvnloading/EchoLingua.git
cd echolingua-ai

# Install dependencies
npm install

# Start development server
npm start

Environment Configuration

Create a .env file in the root directory:

API_KEY=your_google_genai_api_key

Usage Guide

Interpreter Mode

Navigate to the Interpreter tab.
Designate two active languages.
Activate the Microphone to initialize WebSocket.
Speak freely; system auto-detects language.
Deactivate microphone to end session.

Writing Lab

Navigate to the Writing Lab tab.
Select target language from dropdown.
Input text into the drafting area.
Select Review for Gemini Flash analysis.
Click Speaker icon for Neural TTS.