A little question about wavlm and seed mask #23

RobinWitch · 2023-11-03T07:22:46Z

Hello, thanks for sharing code !
In the DiffuseStyleGesture , the model only use one audio feature , wavlm .
But when extract wavlm feature from raw wav , the code did't norm wav beform extract wavlm feature , while DiffuseStyleGesture++ do it .. . , I think DiffuseStyleGesture++ is right ,while DiffuseStyleGesture only use wavlm feature extracted from wav not norm but still generate good result in sample , it confuses me a lot .

In the DiffuseStyleGesture ++ . the paper "The DiffuseStyleGesture+ entry to the GENEA Challenge 2023" Figure 1 mentioned that seed gesture should use a mask while the code not .

YoungSeng · 2023-11-04T03:08:50Z

Yeah, you're right. The WavLM for the first question should preferably be normalized; for the second question this RM is not quite the same as in the paper, but it should have little impact on the results, as stylization as well as controllability are not considered in the challenge. I have updated in Readme. Thanks for the correction.

RobinWitch · 2023-11-04T06:51:06Z

Maybe I don't explain my first question well , I suppose that using not normalized wav to get WavLM is a fatal mistake ,why it works in the end, it's so strange.... While I didn't test use normalizd wav to get WavLM in DiffuseStyleGesture...

YoungSeng · 2023-11-05T03:44:35Z

I'm guessing that's just a feature extractor, and the bias is the same if none of them are normalized. This may only work on this particular speaker.

RobinWitch · 2023-11-05T07:13:37Z

Hum...still sounds strange :(
May be we need to make a simple test.

jiaqiAA · 2023-11-06T06:14:44Z

Hi, I think it norm wav beform extract wavlm feature, the code is cmd = ['ffmpeg-normalize', wav_file, '-o', normalize_wav_path, '-ar', '16000'] in here. But it is different from the code, I don't know if it works.

RobinWitch · 2023-11-06T08:08:21Z

Well ,I find that mean and std of the wav is not 0 and 1 .The commandffmpeg-normalize is not same
role as layer_norm ,you can refer to it .

Hi, I think it norm wav beform extract wavlm feature, the code is cmd = ['ffmpeg-normalize', wav_file, '-o', normalize_wav_path, '-ar', '16000'] in here. But it is different from the code, I don't know if it works.

YoungSeng added the question Further information is requested label Nov 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A little question about wavlm and seed mask #23

A little question about wavlm and seed mask #23

RobinWitch commented Nov 3, 2023

YoungSeng commented Nov 4, 2023 •

edited

Loading

RobinWitch commented Nov 4, 2023

YoungSeng commented Nov 5, 2023 •

edited

Loading

RobinWitch commented Nov 5, 2023

jiaqiAA commented Nov 6, 2023

RobinWitch commented Nov 6, 2023

A little question about wavlm and seed mask #23

A little question about wavlm and seed mask #23

Comments

RobinWitch commented Nov 3, 2023

YoungSeng commented Nov 4, 2023 • edited Loading

RobinWitch commented Nov 4, 2023

YoungSeng commented Nov 5, 2023 • edited Loading

RobinWitch commented Nov 5, 2023

jiaqiAA commented Nov 6, 2023

RobinWitch commented Nov 6, 2023

YoungSeng commented Nov 4, 2023 •

edited

Loading

YoungSeng commented Nov 5, 2023 •

edited

Loading