Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A little question about wavlm and seed mask #23

Open
RobinWitch opened this issue Nov 3, 2023 · 6 comments
Open

A little question about wavlm and seed mask #23

RobinWitch opened this issue Nov 3, 2023 · 6 comments
Labels
question Further information is requested

Comments

@RobinWitch
Copy link

Hello, thanks for sharing code !
In the DiffuseStyleGesture , the model only use one audio feature , wavlm .
But when extract wavlm feature from raw wav , the code did't norm wav beform extract wavlm feature , while DiffuseStyleGesture++ do it .. . , I think DiffuseStyleGesture++ is right ,while DiffuseStyleGesture only use wavlm feature extracted from wav not norm but still generate good result in sample , it confuses me a lot .

In the DiffuseStyleGesture ++ . the paper "The DiffuseStyleGesture+ entry to the GENEA Challenge 2023" Figure 1 mentioned that seed gesture should use a mask while the code not .

@YoungSeng
Copy link
Owner

YoungSeng commented Nov 4, 2023

Yeah, you're right. The WavLM for the first question should preferably be normalized; for the second question this RM is not quite the same as in the paper, but it should have little impact on the results, as stylization as well as controllability are not considered in the challenge. I have updated in Readme. Thanks for the correction.

@YoungSeng YoungSeng added the question Further information is requested label Nov 4, 2023
@RobinWitch
Copy link
Author

Maybe I don't explain my first question well , I suppose that using not normalized wav to get WavLM is a fatal mistake ,why it works in the end, it's so strange.... While I didn't test use normalizd wav to get WavLM in DiffuseStyleGesture...

@YoungSeng
Copy link
Owner

YoungSeng commented Nov 5, 2023

I'm guessing that's just a feature extractor, and the bias is the same if none of them are normalized. This may only work on this particular speaker.

@RobinWitch
Copy link
Author

Hum...still sounds strange :(
May be we need to make a simple test.

@jiaqiAA
Copy link

jiaqiAA commented Nov 6, 2023

Hi, I think it norm wav beform extract wavlm feature, the code is cmd = ['ffmpeg-normalize', wav_file, '-o', normalize_wav_path, '-ar', '16000'] in here. But it is different from the code, I don't know if it works.

@RobinWitch
Copy link
Author

Well ,I find that mean and std of the wav is not 0 and 1 .The commandffmpeg-normalize is not same
role as layer_norm ,you can refer to it .

Hi, I think it norm wav beform extract wavlm feature, the code is cmd = ['ffmpeg-normalize', wav_file, '-o', normalize_wav_path, '-ar', '16000'] in here. But it is different from the code, I don't know if it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants