DarkStream: real-time speech anonymization with low latency

Anonymous submission to ASRU 2025

Abstract

We propose DarkStream, a streaming speech synthesis architecture for real-time speaker anonymization. To improve content representation under strict latency constraints, DarkStream combines a causal waveform-based encoder, a short lookahead buffer, and transformer-based contextual layers. To further reduce inference time, the model generates waveforms directly via a neural vocoder, thus removing the need for intermediate mel-spectrogram conversions. Finally, DarkStream anonymizes speaker identity by injecting a GAN-generated pseudo-speaker embedding into linguistic features from the content encoder. Evaluations show our model achieves strong anonymization, yielding close to 50% speaker verification EER (near-chance performance) on the lazy-informed attack scenario, while maintaining acceptable linguistic intelligibility (WER within 9%). By balancing low-latency operation, robust privacy, and minimal intelligibility degradation, DarkStream provides a practical solution for privacy-preserving real-time speech communication.

Block Diagram

Block Diagram
Block diagram of the proposed system. (a) Training flow for content encoder, (b) content encoder architecture, (c) decoder training flow, (d) speaker/variance adapter and (e) anonymization inference.

Notes

Audio Samples

Speaker Text Input speech Wav Wav+CL Wav+CL+KM
BDL Author of the danger trail Philip Steels and etc.
Not at this particular case Tom apologized Whittemore.
For the twentieth time that evening the two men shook hands.
CLB Lord but I'm glad to see you again Phil.
Will we ever forget it.
God bless 'em I hope I will go on seeing them forever.
RMS And you always want to see it in the superlative degree.
Gad your letter came just in time.
He turned sharply and faced Gregson across the table.
SLT I'm playing a single hand in what looks like a losing game.
If I ever needed a fighter in my life I need one now.
Gregson shoved back his chair and rose his feet.