-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A little question about wavlm and seed mask #23
Comments
Yeah, you're right. The WavLM for the first question should preferably be normalized; for the second question this RM is not quite the same as in the paper, but it should have little impact on the results, as stylization as well as controllability are not considered in the challenge. I have updated in Readme. Thanks for the correction. |
Maybe I don't explain my first question well , I suppose that using not normalized wav to get WavLM is a fatal mistake ,why it works in the end, it's so strange.... While I didn't test use normalizd wav to get WavLM in DiffuseStyleGesture... |
I'm guessing that's just a feature extractor, and the bias is the same if none of them are normalized. This may only work on this particular speaker. |
Hum...still sounds strange :( |
Well ,I find that mean and std of the wav is not 0 and 1 .The command
|
Hello, thanks for sharing code !
In the DiffuseStyleGesture , the model only use one audio feature , wavlm .
But when extract wavlm feature from raw wav , the code did't norm wav beform extract wavlm feature , while DiffuseStyleGesture++ do it .. . , I think DiffuseStyleGesture++ is right ,while DiffuseStyleGesture only use wavlm feature extracted from wav not norm but still generate good result in sample , it confuses me a lot .
In the DiffuseStyleGesture ++ . the paper "The DiffuseStyleGesture+ entry to the GENEA Challenge 2023" Figure 1 mentioned that seed gesture should use a mask while the code not .
The text was updated successfully, but these errors were encountered: