Privacy-preserving and controllable speech technology

Building low-latency speech systems that preserve content while controlling identity, accent, and expression.

I work on speech AI at the intersection of real-time voice conversion, speaker anonymization, accent conversion, controllable speech synthesis, and perceptual evaluation. My recent work studies how to make speech systems usable under strict latency constraints while preserving intelligibility, naturalness, and privacy.

News

Recent updates from papers, demos, and research activity.

Selected Publications

Use the filters for quick scanning.

arXiv 2026

TVTSyn: Content-Synchronous Time-Varying Timbre for Streaming Voice Conversion and Anonymization

Waris Quamer, Mu-Ruei Tseng, Ghady Nasrallah, Ricardo Gutierrez-Osuna

Introduces a streamable synthesizer that aligns speaker identity and content at frame level, reaching under 80 ms GPU latency while improving naturalness, transfer, and anonymization.

SLT 2024

End-to-end Streaming Model for Low-Latency Speech Anonymization

Waris Quamer, Ricardo Gutierrez-Osuna

A streaming speech anonymization model with full and lite versions, reducing latency to 230 ms and 66 ms while preserving naturalness, intelligibility, and privacy.

arXiv 2024

Disentangling Segmental and Prosodic Factors to Non-native Speech Comprehensibility

Waris Quamer, Ricardo Gutierrez-Osuna

Studies independent control of segmental and prosodic accent cues and finds that segmental features have a larger impact on perceived comprehensibility than prosody.

Demo

Zero-Shot Foreign Accent Conversion without a Native Reference

Waris Quamer, Anurag Das, John Levis, Evgeny Chukharev-Hudilainen, Ricardo Gutierrez-Osuna

Reference-free accent conversion samples using L2-ARCTIC speech, comparing baseline and proposed systems for non-native to American-accent conversion.

Demo

PromptDub: Controllable Expressive Speech Synthesis using Multimodal Foundation Models

Waris Quamer, Fanjie Kong, Abhinav Jain, Abhishek Yanamandra, Tuan Dinh, Zhu Liu, Vimal Bhat

A controllable dubbing pipeline that uses multimodal scene, audio, and language cues to generate editable expressive TTS prompts.

Study

RAFT-Edit Listening Studies

Trait-level evaluation pages for controlled edits of age presentation, gender presentation, pitch, and voice quality.

Interactive A/B listening studies for source-relative static-trait manipulation in speech privacy workflows.

Projects and Demos

Existing audio demos are preserved and linked here.

Block diagram for streaming speech anonymization

Streaming Speech Anonymization

Low-latency anonymization with full and lite model variants, including audio samples and latency-focused comparisons.

PrivacyStreamingSpeech synthesis
Block diagram for prosody-aware accent conversion

Prosody-Aware Accent Conversion

Independent manipulation of segmental and prosodic channels to study non-native speech comprehensibility.

Accent conversionProsodyPerception
Block diagram for DarkStream real-time speech anonymization

DarkStream

A real-time speech anonymization prototype focused on causal waveform encoding and direct waveform generation.

Real-timeAnonymizationLow latency
Visualization for privacy and quality tradeoffs in speaker anonymization

Voice Attribute Editing

Audio demonstrations around controllable speaker-attribute edits and privacy-quality tradeoffs.

Voice editingPrivacyEvaluation

Experience and Education

Selected academic and research context.

Education

  • Texas A&M University, Department of Computer Science and Engineering. Research focus: speech AI, privacy, voice conversion, and speech synthesis.
  • Additional degree timeline, advisor, thesis, and prior education details are available in the CV.

Research Experience

  • Speech privacy and anonymization: low-latency, end-to-end systems for preserving linguistic content while reducing speaker identity leakage.
  • Accent conversion and intelligibility: controllable segmental/prosodic modeling for non-native speech studies.
  • Expressive speech synthesis: multimodal prompting and controllable dubbing through PromptDub.

Service, Skills, and Keywords

Useful for SEO and quick academic scanning.

SpeechAnonymization, voice conversion, accent conversion, TTS, expressive dubbing.
MLRepresentation learning, disentanglement, vector quantization, streaming inference.
EvalPerceptual listening tests, speaker verification privacy metrics, intelligibility and naturalness.