Attribute Editing in Embedding Space

Waris Quamer1, Ricardo Gutierrez-Osuna1

1Department of Computer Science and Engineering, Texas A&M University, USA

We have developed a parametric approach to modify specific speaker attributes, such as age and gender, within latent speaker embeddings. First, we applied principal component analysis (PCA) to the speaker embeddings, then we identified PC directions in the latent space that correlate with the desired attributes. For this purpose, we computed the Pearson correlation coefficient ρ of each PCA dimension with the speaker attributes (e.g., age).

Then, we used these correlation coefficients as weights to construct a composite direction Vage that captures the primary variance associated with age:

Vage = w1 PC1 + w2 PC2 + ⋯

where, wi = ρ(PCi, age). To modify the attribute in the embedding, we adjust the original embedding Z along this direction:

Z' = Z + λVage

where λ controls the extent of modification: positive λ increases the attribute (e.g., age), while negative λ decreases it. This method enables fine-grained control over age or gender within speaker embeddings by moving in attribute-correlated directions in latent space, as illustrated in the figure below. Note that, by identifying these latent vectors along the directions of highest variance in the data, the approach is robust to noise. Audio samples of the results of our approach to attribute editing are available in a footnote.

Attribute Editing Diagram
Attribute editing in embedding space via principal components projections

Notes



Audio Samples

Below are audio samples of the results of our approach to attribute editing in speaker embeddings.

Sex Editing

Speaker Feminine -- Original Feminine ++
RMS (Male)
BDL (Male)
SLT (Female)
CLB (Female)



Age Editing

Speaker Age -- Original Age ++
RMS (Male)
BDL (Male)
SLT (Female)
CLB (Female)



References

[1] W. Quamer et al., "End-to-end streaming model for low-latency speech anonymization," in IEEE SLT 2024.