Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The test datasets used for evaluation metrics #19

Open
lovemino opened this issue Jul 8, 2024 · 10 comments
Open

The test datasets used for evaluation metrics #19

lovemino opened this issue Jul 8, 2024 · 10 comments

Comments

@lovemino
Copy link

lovemino commented Jul 8, 2024

Thank you very much for your contribution and for sharing it. I have always been curious about the evaluation metrics for co-speech, and I would like to ask whether the test datasets used for the metrics in your paper are the same as the ones used for Camn. I noticed that the test datasets in your code are somewhat different from Camn in terms of LMDB loading. If you could spare some time to answer this, I would be very grateful.

@JeremyCJM
Copy link
Owner

Hi lovemino, the test set should be the same as CaMN except that I converted the Euler rotations into axis-angle format. Therefore, the autoencoders to compute Fréchet Distances are also retrained for axis-angle rotations. Note that the Frechet Distance code of CaMN did not turn on evaluation mode, while the results in our paper are after correcting this issue.

@lovemino
Copy link
Author

Hi lovemino, the test set should be the same as CaMN except that I converted the Euler rotations into axis-angle format. Therefore, the autoencoders to compute Fréchet Distances are also retrained for axis-angle rotations. Note that the Frechet Distance code of CaMN did not turn on evaluation mode, while the results in our paper are after correcting this issue.

Thank you very much for your reply. I found your paper to be meticulously written and it is certainly a valuable read. However, as I attempted to reproduce the experimental results, I encountered some difficulties. Specifically, I found that the LMDB used in CAMN caused errors when used in your project.

Would it be possible for you to provide the processed LMDB file for the test datasets? I would greatly appreciate it.

Thank you for your contribution and response.

@JeremyCJM
Copy link
Owner

JeremyCJM commented Jul 10, 2024

Hi, this is because I am using the latest version of lmdb. You can try replacing the lmdb in BEAT with latest version to generate the data cache. Regarding releasing the processed test dataset, I would need to check the license of the dataset. Let me know if upgrading lmdb helps :)

@lovemino
Copy link
Author

Hi, this is because I am using the latest version of lmdb. You can try replacing the lmdb in BEAT with latest version to generate the data cache. Regarding releasing the processed test dataset, I would need to check the license of the dataset. Let me know if upgrading lmdb helps :)
am very grateful for your prompt reply and look forward to you making the LMDB public. Once again, I want to express my appreciation for your contribution and I look forward to your future papers.Thank you very much.

@lovemino
Copy link
Author

Based on your response, I used the build_cache function from your beat.py and the test set from CAMN to generate a new data.mdb file. Then, using the ges_axis_angle_300 weights and the CAMN test code, I calculated the axis_angle gestures of the 141 upper body joints (before converting to a matrix) during model inference. However, The resulting FDG was 4282.439, which is 10 times higher than the FDG: 438.93reported in your paper. I would like to ask if you could kindly provide the code and files you used to calculate these metrics. This would help me accurately reproduce the metrics reported in your paper.

Hi, this is because I am using the latest version of lmdb. You can try replacing the lmdb in BEAT with latest version to generate the data cache. Regarding releasing the processed test dataset, I would need to check the license of the dataset. Let me know if upgrading lmdb helps :)

@JeremyCJM
Copy link
Owner

Hi lovemino, before sharing the code, here are some traps you can check:

  • Have you turned on the eval mode?
  • Have you normalized the motion before feeding them into AutoEncoder of FGD?
  • What is your FGD of CaMN with our autoencoder checkpoint (convert to axis-angle)? Is it also very large?

@lovemino
Copy link
Author

lovemino commented Jul 15, 2024

Hi lovemino, before sharing the code, here are some traps you can check:

  • Have you turned on the eval mode?
  • Have you normalized the motion before feeding them into AutoEncoder of FGD?
  • What is your FGD of CaMN with our autoencoder checkpoint (convert to axis-angle)? Is it also very large?

Hello,
Following your suggestion, I tested CaMN using your ges_axis_angle_300.bin. I converted the results and dataset of CaMN from Euler Angles to Axis-Angle using the conversion scripts you provided. Additionally, I used the mean and std.npy files and your ges_axis_angle_300.bin, but the resulting FGD is 800.22, which is different from the 1635.44 reported in your paper. This issue has troubled me for a long time. If you could kindly provide your test dataset via email, I would be immensely grateful. I appreciate your work and your response.
Here is my Google email: lbj1040702929@gmail.com!
Thank you.

@JeremyCJM
Copy link
Owner

JeremyCJM commented Jul 17, 2024

Hi liubeibei, here is the link for the processed test set of BEAT. Please comply with the original license and restrictions of the BEAT dataset. Cheers!

@lovemino
Copy link
Author

lovemino commented Jul 18, 2024

Hi liubeibei, here is the link for the processed test set of BEAT. Please comply with the original license and restrictions of the BEAT dataset. Cheers!

I am truly grateful for your willingness to provide the test set; you are a real lifesaver. If possible, could you also provide the relevant code for testing the FGD metric? Following the evaluation metrics written in your train function results in an error in motion_autoencoder because CaMN uses a sliding window to generate multiple latents. According to your method, it only gets one latent with shape (1, 34, 192) for batch size 1, whereas CaMN gets around 83 latents through sliding window, resulting in a shape of (83, 34, 192).
yours:
latent_out = self.eval_model(outputs[:, :34, :].float()) latent_ori = self.eval_model(motions[:, :34, :].float())

CaMN:

        for j in range(num_divs):
                if j == 0:
                    cat_results = myoutputs[:,j*stride:j*stride+pose_length, :]#[83, 34, 141]
                    cat_targets = tar_pose2[:,j*stride:j*stride+pose_length, :]
                else:
                    cat_results = torch.cat([cat_results, myoutputs[:,j*stride:j*stride+pose_length, :]], 0)
                   cat_targets = torch.cat([cat_targets, tar_pose2[:,j*stride:j*stride+pose_length, :]], 0)
            latent_out = self.eval_model(cat_results.float())
            latent_ori = self.eval_model(cat_targets.float())

One guess is that according to the data.mdb you provided, the shape of the pose/pose_axis_angle variable in the test_dataset I read is [855, 141], not [256, 34, 141] as you commented.
So, whether I follow your method in the train function or use a sliding window like CaMN, I still don't get the correct metric. If you could provide the testing code, I would be incredibly grateful. You are truly a kind person, an angel!Here is my Google email: lbj1040702929@gmail.com ! Thank you.

@lovemino
Copy link
Author

Hello, I have a simple question. Did you use the Beat 2, 4, 6, 8 dataset but with a different processing method, so you retrained the autoencoder? When comparing with CaMN, did you convert the results from the CaMN model to Axis-Angle to calculate the metrics, or did you retrain the entire CaMN model using your processed dataset? Thank you very much! I would be very grateful if you could answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants