Waris Quamer | Speech AI Researcher

Privacy-preserving and controllable speech technology

Building low-latency speech systems that preserve content while controlling identity, accent, and expression.

I work on speech AI at the intersection of real-time voice conversion, speaker anonymization, accent conversion, controllable speech synthesis, and perceptual evaluation. My recent work studies how to make speech systems usable under strict latency constraints while preserving intelligibility, naturalness, and privacy.

View Publications Listen to Demos Download CV

News

Recent updates from papers, demos, and research activity.

2026.02TVTSyn is available on arXiv, introducing content-synchronous time-varying timbre for streaming voice conversion and anonymization.
2025PromptDub explores controllable expressive dubbing with multimodal foundation models.
2024.10The revised version of End-to-end streaming model for low-latency speech anonymization is available on arXiv.
2024.08Disentangling segmental and prosodic factors to non-native speech comprehensibility is available on arXiv.

Selected Publications

Use the filters for quick scanning.

arXiv 2026

TVTSyn: Content-Synchronous Time-Varying Timbre for Streaming Voice Conversion and Anonymization

Waris Quamer, Mu-Ruei Tseng, Ghady Nasrallah, Ricardo Gutierrez-Osuna

Introduces a streamable synthesizer that aligns speaker identity and content at frame level, reaching under 80 ms GPU latency while improving naturalness, transfer, and anonymization.

Paper

SLT 2024

End-to-end Streaming Model for Low-Latency Speech Anonymization

Waris Quamer, Ricardo Gutierrez-Osuna

A streaming speech anonymization model with full and lite versions, reducing latency to 230 ms and 66 ms while preserving naturalness, intelligibility, and privacy.

Paper Demo Latency Samples

arXiv 2024

Disentangling Segmental and Prosodic Factors to Non-native Speech Comprehensibility

Waris Quamer, Ricardo Gutierrez-Osuna

Studies independent control of segmental and prosodic accent cues and finds that segmental features have a larger impact on perceived comprehensibility than prosody.

Paper Demo

Demo

Zero-Shot Foreign Accent Conversion without a Native Reference

Waris Quamer, Anurag Das, John Levis, Evgeny Chukharev-Hudilainen, Ricardo Gutierrez-Osuna

Reference-free accent conversion samples using L2-ARCTIC speech, comparing baseline and proposed systems for non-native to American-accent conversion.

Demo

PromptDub: Controllable Expressive Speech Synthesis using Multimodal Foundation Models

Waris Quamer, Fanjie Kong, Abhinav Jain, Abhishek Yanamandra, Tuan Dinh, Zhu Liu, Vimal Bhat

A controllable dubbing pipeline that uses multimodal scene, audio, and language cues to generate editable expressive TTS prompts.

Demo

Study

RAFT-Edit Listening Studies

Trait-level evaluation pages for controlled edits of age presentation, gender presentation, pitch, and voice quality.

Interactive A/B listening studies for source-relative static-trait manipulation in speech privacy workflows.

Listening Studies

Projects and Demos

Existing audio demos are preserved and linked here.

Streaming Speech Anonymization

Low-latency anonymization with full and lite model variants, including audio samples and latency-focused comparisons.

Open Demo Latency Demo

PrivacyStreamingSpeech synthesis

Prosody-Aware Accent Conversion

Independent manipulation of segmental and prosodic channels to study non-native speech comprehensibility.

Open Demo Zero-shot Demo

Accent conversionProsodyPerception

DarkStream

A real-time speech anonymization prototype focused on causal waveform encoding and direct waveform generation.

Open Demo

Real-timeAnonymizationLow latency

Visualization for privacy and quality tradeoffs in speaker anonymization

Voice Attribute Editing

Audio demonstrations around controllable speaker-attribute edits and privacy-quality tradeoffs.

Open Demo Rectified Flow Demo

Voice editingPrivacyEvaluation

Experience and Education

Selected academic and research context.

Education

Texas A&M University, Department of Computer Science and Engineering. Research focus: speech AI, privacy, voice conversion, and speech synthesis.
Additional degree timeline, advisor, thesis, and prior education details are available in the CV.

Research Experience

Speech privacy and anonymization: low-latency, end-to-end systems for preserving linguistic content while reducing speaker identity leakage.
Accent conversion and intelligibility: controllable segmental/prosodic modeling for non-native speech studies.
Expressive speech synthesis: multimodal prompting and controllable dubbing through PromptDub.

Service, Skills, and Keywords

Useful for SEO and quick academic scanning.

SpeechAnonymization, voice conversion, accent conversion, TTS, expressive dubbing.

MLRepresentation learning, disentanglement, vector quantization, streaming inference.

EvalPerceptual listening tests, speaker verification privacy metrics, intelligibility and naturalness.