Post-HNR RAFT-Edit Ablation Listening Study

Matched audio comparisons from the HNR-corrected multi-attribute model evaluations. Every row uses the same source utterance and slider request, making audible differences attributable to the training ablation rather than sample selection.

Five ablations have complete matched audio: full hybrid, no ordinal bins, no matching discriminator, no adversarial critics, and no sparse-axis curriculum. Age and perceived gender-presentation values are model estimates; pitch and HNR are acoustic proxies.

Single-axis comparison

Age

Matched source across all five ablations. The waveform metric is age-predictor shift.

Source referenceGLOBE · young 20s / male
Ablation-2.0-1.00.0+1.0+2.0
Full hybridAll continuous and ordinal conditioning with realism, matching, and interpolation critics.
-2.0
Ranker Δ
-0.717
Age-predictor shift
-0.298
Source cosine
0.686
-1.0
Ranker Δ
-0.717
Age-predictor shift
-0.262
Source cosine
0.730
0.0
Ranker Δ
+0.045
Age-predictor shift
+0.000
Source cosine
0.911
+1.0
Ranker Δ
+1.123
Age-predictor shift
+0.136
Source cosine
0.661
+2.0
Ranker Δ
+1.283
Age-predictor shift
+0.232
Source cosine
0.572
Without ordinal binsContinuous conditioning only; removes ordinal bin structure.
-2.0
Ranker Δ
-0.717
Age-predictor shift
-0.261
Source cosine
0.661
-1.0
Ranker Δ
-0.717
Age-predictor shift
-0.222
Source cosine
0.738
0.0
Ranker Δ
+0.093
Age-predictor shift
+0.000
Source cosine
0.856
+1.0
Ranker Δ
+1.270
Age-predictor shift
+0.173
Source cosine
0.670
+2.0
Ranker Δ
+1.283
Age-predictor shift
+0.189
Source cosine
0.626
Without matching discriminatorRetains realism/interpolation critics but removes relative condition matching.
-2.0
Ranker Δ
-0.717
Age-predictor shift
-0.236
Source cosine
0.862
-1.0
Ranker Δ
-0.717
Age-predictor shift
-0.146
Source cosine
0.924
0.0
Ranker Δ
-0.053
Age-predictor shift
+0.000
Source cosine
1.000
+1.0
Ranker Δ
+1.030
Age-predictor shift
+0.156
Source cosine
0.887
+2.0
Ranker Δ
+1.283
Age-predictor shift
+0.287
Source cosine
0.806
Without adversarial criticsUses flow, target, ranker, cycle, self, and geometry objectives only.
-2.0
Ranker Δ
-0.717
Age-predictor shift
-0.228
Source cosine
0.858
-1.0
Ranker Δ
-0.717
Age-predictor shift
-0.146
Source cosine
0.931
0.0
Ranker Δ
+0.003
Age-predictor shift
+0.000
Source cosine
1.000
+1.0
Ranker Δ
+1.041
Age-predictor shift
+0.156
Source cosine
0.894
+2.0
Ranker Δ
+1.283
Age-predictor shift
+0.300
Source cosine
0.796
Without sparse-axis curriculumRemoves auxiliary partial-control supervision.
-2.0
Ranker Δ
-0.717
Age-predictor shift
-0.124
Source cosine
0.745
-1.0
Ranker Δ
-0.717
Age-predictor shift
-0.145
Source cosine
0.719
0.0
Ranker Δ
+0.383
Age-predictor shift
+0.000
Source cosine
0.776
+1.0
Ranker Δ
+1.283
Age-predictor shift
+0.201
Source cosine
0.661
+2.0
Ranker Δ
+1.283
Age-predictor shift
+0.265
Source cosine
0.666

Source ID: GLOBE::train::S_007523::00435966_000002.v2.vad

Single-axis comparison

Perceived gender presentation

Matched source across all five ablations. The waveform metric is male-probability shift.

Source referenceLibriTTS · senior 60s / female
Ablation-2.0-1.00.0+1.0+2.0
Full hybridAll continuous and ordinal conditioning with realism, matching, and interpolation critics.
-2.0
Ranker Δ
-0.425
Male-probability shift
-0.003
Source cosine
0.664
-1.0
Ranker Δ
-0.425
Male-probability shift
-0.003
Source cosine
0.634
0.0
Ranker Δ
-0.008
Male-probability shift
+0.000
Source cosine
0.998
+1.0
Ranker Δ
+1.361
Male-probability shift
+0.961
Source cosine
0.670
+2.0
Ranker Δ
+1.575
Male-probability shift
+0.983
Source cosine
0.681
Without ordinal binsContinuous conditioning only; removes ordinal bin structure.
-2.0
Ranker Δ
-0.425
Male-probability shift
-0.000
Source cosine
0.593
-1.0
Ranker Δ
-0.425
Male-probability shift
-0.000
Source cosine
0.593
0.0
Ranker Δ
+0.097
Male-probability shift
+0.000
Source cosine
0.903
+1.0
Ranker Δ
+1.203
Male-probability shift
+0.968
Source cosine
0.706
+2.0
Ranker Δ
+1.575
Male-probability shift
+0.990
Source cosine
0.698
Without matching discriminatorRetains realism/interpolation critics but removes relative condition matching.
-2.0
Ranker Δ
-0.425
Male-probability shift
-0.003
Source cosine
0.848
-1.0
Ranker Δ
-0.425
Male-probability shift
-0.003
Source cosine
0.955
0.0
Ranker Δ
+0.033
Male-probability shift
+0.000
Source cosine
1.000
+1.0
Ranker Δ
+1.062
Male-probability shift
+0.920
Source cosine
0.983
+2.0
Ranker Δ
+1.575
Male-probability shift
+0.990
Source cosine
0.917
Without adversarial criticsUses flow, target, ranker, cycle, self, and geometry objectives only.
-2.0
Ranker Δ
-0.425
Male-probability shift
-0.004
Source cosine
0.852
-1.0
Ranker Δ
-0.425
Male-probability shift
-0.004
Source cosine
0.948
0.0
Ranker Δ
+0.030
Male-probability shift
+0.000
Source cosine
1.000
+1.0
Ranker Δ
+1.000
Male-probability shift
+0.878
Source cosine
0.971
+2.0
Ranker Δ
+1.575
Male-probability shift
+0.989
Source cosine
0.910
Without sparse-axis curriculumRemoves auxiliary partial-control supervision.
-2.0
Ranker Δ
-0.425
Male-probability shift
-0.001
Source cosine
0.640
-1.0
Ranker Δ
-0.425
Male-probability shift
-0.001
Source cosine
0.640
0.0
Ranker Δ
+0.097
Male-probability shift
+0.000
Source cosine
0.719
+1.0
Ranker Δ
+1.344
Male-probability shift
+0.975
Source cosine
0.671
+2.0
Ranker Δ
+1.575
Male-probability shift
+0.981
Source cosine
0.702

Source ID: LibriTTS::4234::4234_187735_000005_000000

Single-axis comparison

Habitual pitch

Matched source across all five ablations. The waveform metric is median log-f0 shift.

Source referenceLibriTTS · senior 60s / male
Ablation-2.0-1.00.0+1.0+2.0
Full hybridAll continuous and ordinal conditioning with realism, matching, and interpolation critics.
-2.0
Ranker Δ
-0.851
Median log-F0 shift
-0.413
Source cosine
0.549
-1.0
Ranker Δ
-0.851
Median log-F0 shift
-0.429
Source cosine
0.611
0.0
Ranker Δ
+0.024
Median log-F0 shift
+0.000
Source cosine
0.925
+1.0
Ranker Δ
+1.035
Median log-F0 shift
+0.473
Source cosine
0.556
+2.0
Ranker Δ
+1.149
Median log-F0 shift
+0.927
Source cosine
0.692
Without ordinal binsContinuous conditioning only; removes ordinal bin structure.
-2.0
Ranker Δ
-0.851
Median log-F0 shift
-0.400
Source cosine
0.593
-1.0
Ranker Δ
-0.851
Median log-F0 shift
-0.338
Source cosine
0.585
0.0
Ranker Δ
-0.090
Median log-F0 shift
+0.000
Source cosine
0.773
+1.0
Ranker Δ
+1.018
Median log-F0 shift
+0.541
Source cosine
0.591
+2.0
Ranker Δ
+1.149
Median log-F0 shift
+0.790
Source cosine
0.648
Without matching discriminatorRetains realism/interpolation critics but removes relative condition matching.
-2.0
Ranker Δ
-0.851
Median log-F0 shift
-0.491
Source cosine
0.806
-1.0
Ranker Δ
-0.851
Median log-F0 shift
-0.378
Source cosine
0.838
0.0
Ranker Δ
-0.005
Median log-F0 shift
+0.000
Source cosine
1.000
+1.0
Ranker Δ
+0.806
Median log-F0 shift
+0.808
Source cosine
0.937
+2.0
Ranker Δ
+1.149
Median log-F0 shift
+0.947
Source cosine
0.843
Without adversarial criticsUses flow, target, ranker, cycle, self, and geometry objectives only.
-2.0
Ranker Δ
-0.851
Median log-F0 shift
-0.506
Source cosine
0.783
-1.0
Ranker Δ
-0.851
Median log-F0 shift
-0.383
Source cosine
0.833
0.0
Ranker Δ
-0.009
Median log-F0 shift
+0.000
Source cosine
1.000
+1.0
Ranker Δ
+0.892
Median log-F0 shift
+0.845
Source cosine
0.925
+2.0
Ranker Δ
+1.149
Median log-F0 shift
+0.964
Source cosine
0.811
Without sparse-axis curriculumRemoves auxiliary partial-control supervision.
-2.0
Ranker Δ
-0.851
Median log-F0 shift
-0.476
Source cosine
0.661
-1.0
Ranker Δ
-0.851
Median log-F0 shift
-0.442
Source cosine
0.622
0.0
Ranker Δ
-0.144
Median log-F0 shift
+0.000
Source cosine
0.639
+1.0
Ranker Δ
+0.957
Median log-F0 shift
+0.547
Source cosine
0.585
+2.0
Ranker Δ
+1.149
Median log-F0 shift
+0.707
Source cosine
0.637

Source ID: LibriTTS::6804::6804_79287_000013_000013

Single-axis comparison

Voice quality / HNR

Matched source across all five ablations. The waveform metric is hnr shift (db).

Source referenceLibriTTS · young 20s / male
Ablation-2.0-1.00.0+1.0+2.0
Full hybridAll continuous and ordinal conditioning with realism, matching, and interpolation critics.
-2.0
Ranker Δ
-1.198
HNR shift (dB)
-1.786
Source cosine
0.639
-1.0
Ranker Δ
-1.198
HNR shift (dB)
-0.481
Source cosine
0.663
0.0
Ranker Δ
-0.000
HNR shift (dB)
+0.000
Source cosine
0.971
+1.0
Ranker Δ
+0.802
HNR shift (dB)
+6.746
Source cosine
0.688
+2.0
Ranker Δ
+0.802
HNR shift (dB)
+11.453
Source cosine
0.481
Without ordinal binsContinuous conditioning only; removes ordinal bin structure.
-2.0
Ranker Δ
-1.198
HNR shift (dB)
-5.162
Source cosine
0.601
-1.0
Ranker Δ
-1.198
HNR shift (dB)
-5.080
Source cosine
0.698
0.0
Ranker Δ
-0.119
HNR shift (dB)
+0.000
Source cosine
0.865
+1.0
Ranker Δ
+0.802
HNR shift (dB)
+5.360
Source cosine
0.699
+2.0
Ranker Δ
+0.802
HNR shift (dB)
+12.203
Source cosine
0.550
Without matching discriminatorRetains realism/interpolation critics but removes relative condition matching.
-2.0
Ranker Δ
-1.198
HNR shift (dB)
-4.727
Source cosine
0.905
-1.0
Ranker Δ
-0.930
HNR shift (dB)
-1.476
Source cosine
0.939
0.0
Ranker Δ
+0.009
HNR shift (dB)
+0.000
Source cosine
1.000
+1.0
Ranker Δ
+0.802
HNR shift (dB)
+4.309
Source cosine
0.961
+2.0
Ranker Δ
+0.802
HNR shift (dB)
+13.134
Source cosine
0.890
Without adversarial criticsUses flow, target, ranker, cycle, self, and geometry objectives only.
-2.0
Ranker Δ
-1.198
HNR shift (dB)
-4.465
Source cosine
0.924
-1.0
Ranker Δ
-0.885
HNR shift (dB)
-1.630
Source cosine
0.932
0.0
Ranker Δ
+0.009
HNR shift (dB)
+0.000
Source cosine
1.000
+1.0
Ranker Δ
+0.802
HNR shift (dB)
+4.957
Source cosine
0.947
+2.0
Ranker Δ
+0.802
HNR shift (dB)
+13.311
Source cosine
0.872
Without sparse-axis curriculumRemoves auxiliary partial-control supervision.
-2.0
Ranker Δ
-1.198
HNR shift (dB)
-5.233
Source cosine
0.696
-1.0
Ranker Δ
-1.070
HNR shift (dB)
-3.485
Source cosine
0.648
0.0
Ranker Δ
-0.126
HNR shift (dB)
+0.000
Source cosine
0.784
+1.0
Ranker Δ
+0.802
HNR shift (dB)
+2.037
Source cosine
0.731
+2.0
Ranker Δ
+0.802
HNR shift (dB)
+11.679
Source cosine
0.699

Source ID: LibriTTS::8108::8108_274318_000012_000000