Age manipulation
Age pseudo-label response from the external speech age predictor.
| Cohort | Source / reconstruction | -2.0 | -1.0 | 0.0 | +1.0 | +2.0 |
|---|---|---|---|---|---|---|
| 20s M | Source Reconstruction LibriTTS · 7640_111784_000011_000002 | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 20s F | Source Reconstruction VoxCeleb1 · 00011 | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 40s M | Source Reconstruction NaturalVoices · MSP-PODCAST_1632_104 | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 40s F | Source Reconstruction GLOBE · 00105921_000004.v2.vad | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 60s M | Source Reconstruction LibriTTS · 6804_79287_000013_000013 | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 60s F | Source Reconstruction NaturalVoices · MSP-PODCAST_5606_43 | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
Perceived gender presentation manipulation
Model-predicted male-presentation probability. This is a pseudo-label, not demographic ground truth.
| Cohort | Source / reconstruction | -2.0 | -1.0 | 0.0 | +1.0 | +2.0 |
|---|---|---|---|---|---|---|
| 20s M | Source Reconstruction GLOBE · 00266973_000004.v2.vad | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 20s F | Source Reconstruction VoxCeleb1 · 00002 | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 40s M | Source Reconstruction GLOBE · 00431096_000005.v2.vad | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 40s F | Source Reconstruction NaturalVoices · MSP-PODCAST_0478_1 | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 60s M | Source Reconstruction LibriTTS · 3955_181692_000011_000004 | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 60s F | Source Reconstruction LibriTTS · 1752_16632_000036_000011 | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
Habitual pitch manipulation
Acoustic log-F0 median measured from generated waveform.
| Cohort | Source / reconstruction | -2.0 | -1.0 | 0.0 | +1.0 | +2.0 |
|---|---|---|---|---|---|---|
| 20s M | Source Reconstruction NaturalVoices · MSP-PODCAST_0658_440 | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 20s F | Source Reconstruction GLOBE · 00022382_000004.v2.vad | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 40s M | Source Reconstruction LibriTTS · 90_130566_000006_000001 | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 40s F | Source Reconstruction LibriTTS · 7481_101276_000071_000000 | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 60s M | Source Reconstruction VoxCeleb1 · 00038 | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 60s F | Source Reconstruction LibriTTS · 8778_246974_000024_000009 | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
Voice-quality / HNR manipulation
HNR proxy measured from generated waveform. This metric is noisier and can floor on difficult utterances.
| Cohort | Source / reconstruction | -2.0 | -1.0 | 0.0 | +1.0 | +2.0 |
|---|---|---|---|---|---|---|
| 20s M | Source Reconstruction LibriTTS · 27_124992_000059_000001 | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 20s F | Source Reconstruction VoxCeleb1 · 00002 | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 40s M | Source Reconstruction GLOBE · 00057675_000001.v2.vad | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 40s F | Source Reconstruction GLOBE · 00382646_000013.v2.vad | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 60s M | Source Reconstruction GLOBE · 00219094_000014.v2.vad | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |
| 60s F | Source Reconstruction LibriTTS · 4234_187735_000005_000000 | -2.0 | -1.0 | +0.0 | +1.0 | +2.0 |