Abstract
We propose DarkStream, a streaming speech synthesis architecture for real-time speaker anonymization. To improve content representation under strict latency constraints, DarkStream combines a causal waveform-based encoder, a short lookahead buffer, and transformer-based contextual layers. To further reduce inference time, the model generates waveforms directly via a neural vocoder, thus removing the need for intermediate mel-spectrogram conversions. Finally, DarkStream anonymizes speaker identity by injecting a GAN-generated pseudo-speaker embedding into linguistic features from the content encoder. Evaluations show our model achieves strong anonymization, yielding close to 50% speaker verification EER (near-chance performance) on the lazy-informed attack scenario, while maintaining acceptable linguistic intelligibility (WER within 9%). By balancing low-latency operation, robust privacy, and minimal intelligibility degradation, DarkStream provides a practical solution for privacy-preserving real-time speech communication.
Block Diagram
Notes
- Dataset (CMU-ARCTIC corpus): http://www.festvox.org/cmu_arctic/
Audio Samples
- Input speech: original unmodified speech recordings
- Wav: input speech anonymized through the base version
- Wav+CL: input speech anonymized through the lite version
- Wav+CL+KM: input speech anonymized through the lite version
| Speaker | Text | Input speech | Wav | Wav+CL | Wav+CL+KM |
|---|---|---|---|---|---|
| BDL | Author of the danger trail Philip Steels and etc. | ||||
| Not at this particular case Tom apologized Whittemore. | |||||
| For the twentieth time that evening the two men shook hands. | |||||
| CLB | Lord but I'm glad to see you again Phil. | ||||
| Will we ever forget it. | |||||
| God bless 'em I hope I will go on seeing them forever. | |||||
| RMS | And you always want to see it in the superlative degree. | ||||
| Gad your letter came just in time. | |||||
| He turned sharply and faced Gregson across the table. | |||||
| SLT | I'm playing a single hand in what looks like a losing game. | ||||
| If I ever needed a fighter in my life I need one now. | |||||
| Gregson shoved back his chair and rose his feet. |