-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training on VOCASET produces static template output #5
Comments
🤔 My log shows that after initialization, the
Can you check if this still exists on biwi dataset? |
hi, I meet the same problem, and when i debug, I find the I thought there are some bug in the code, something like variable need training not in optimizer. Could you train it from scractch using master branch code, or give some tips, appreciate your kindness. |
@hqm0810 I've narrowed down the problem to an issue in the loss update function. Specifically, in
While the Another issue I've identified is that the loss tensor seems to have Can you see if this is the cause for your problem? |
Thank you, the cause of the problem is the |
It seems like for some reason computing the loss term in the For file @@ -6,7 +6,7 @@ from transformers import Wav2Vec2Model
from alm.config import instantiate_from_config
from alm.models.modeltype.base import BaseModel
-from alm.models.losses.voca import VOCALosses
+from alm.models.losses.voca import VOCALosses, MaskedConsistency, MaskedVelocityConsistency
from alm.utils.demo_utils import animate
from .base import BaseModel
@@ -44,6 +42,8 @@ class DIFFUSION_BIAS(BaseModel):
key: self._losses["losses_" + key]
for key in ["train", "test", "val", ] # "train_val"
}
+ self.reconstruct = MaskedConsistency()
+ self.reconstruct_v = MaskedVelocityConsistency()
# set up model
self.audio_encoder = Wav2Vec2Model.from_pretrained(cfg.audio_encoder.model_name_or_path)
@@ -114,7 +114,12 @@ class DIFFUSION_BIAS(BaseModel):
batch['audio'][audio_mask] = 0
rs_set = self._diffusion_forward(batch, batch_idx, phase="train")
- loss = self.losses[split].update(rs_set)
+
+ mask = rs_set['vertice_attention'].unsqueeze(-1)
+ loss1 = self.reconstruct(rs_set['vertice'], rs_set['vertice_pred'], mask)
+ loss2 = self.reconstruct_v(rs_set['vertice'], rs_set['vertice_pred'], mask)
+ loss = loss1 + loss2
+ self.losses[split].update(loss1, loss2, loss)
return loss For file @@ -118,31 +118,13 @@ class VOCALosses(Metric):
# lip_vertice = vertice.view(shape[0], shape[1], -1, 3)[:, :, mouth_map, :].view(shape[0], shape[1], -1)
# return lip_vertice
- def update(self, rs_set):
- # rs_set.keys() = dict_keys(['latent', 'latent_pred', 'vertice', 'vertice_recon', 'vertice_pred', 'vertice_attention'])
-
- total: float = 0.0
- # Compute the losses
- # Compute instance loss
-
- # padding mask
- mask = rs_set['vertice_attention'].unsqueeze(-1)
-
+ def update(self, recon, recon_v, ttl):
if self.split in ['losses_train', 'losses_val']:
- # vertice loss
- total += self._update_loss("vertice_enc", rs_set['vertice'], rs_set['vertice_pred'], mask = mask)
- total += self._update_loss("vertice_encv", rs_set['vertice'], rs_set['vertice_pred'], mask = mask)
-
- # lip loss
- # lip_vertice = self.vert2lip(rs_set['vertice'])
- # lip_vertice_pred = self.vert2lip(rs_set['vertice_pred'])
- # total += self._update_loss("lip_enc", lip_vertice, lip_vertice_pred, mask = mask)
- # total += self._update_loss("lip_encv", lip_vertice, lip_vertice_pred, mask = mask)
-
- self.total += total.detach()
+ self.vertice_enc += recon.detach()
+ self.vertice_encv += recon_v.detach()
+ self.total += ttl.detach()
self.count += 1
-
- return total
+ return ttl
if self.split in ['losses_test']:
raise ValueError(f"split {self.split} not supported") This allows the model to train with the correct losses ( |
Hi,all. Is the bug fixed in master branch? |
Thanks for your code! Have you trained the dataset and verified the result? |
Dear all, thanks for your effort. We have released the missing part for training. You can now train the model with decreasing losses. |
Hi, thanks for your reply. But I didn't find your latest commit. How should I use the latest code? |
Hi @aixiaodewugege, do you get any results? I rewrited the loss calculation according to the above and trained on the vocaset for around 3500 epoches, when I tested, the results is still not good, the mouth even not opened. |
Hi. I can train it on vocaset and get good result after 9000 epoch. |
Hi,Have you trained it on muti GPU? I only be able to train it on single GPU. |
I havn't try it on muti GPU yet. But the single GPU is OK for me too. |
Thanks. If you could fix the muti GPU problem, please teach me with it~~~ |
Hi, I tried to modify the loss function a bit, but I still can't train the expected results, can I ask how you made the modification? Thank you! |
Symptoms
Training VOCASET with the supplied wav2vec2 script produces a static output of the template for any audio input. Here's an example:
Here are the Tensorboard error graphs:
Notably, the loss seems to be weirdly small for both components.
Steps to reproduce
dataset/vocaset
folder, I copied over thevertices_npy
andwav
folders that's also used for FaceFormer trainingscripts/diffusion/vocaset_training/diffspeaker_wav2vec2_vocaset.sh
.Troubleshooting steps tried
templates.pkl
and the self-supplied.npy
files are the same.npy
files are 60FPS, but I left the[::2,:]
in the load_data function untouchedprint(len(self.data_splits['train']))
inalm/data/vocaset.py
, I can see that 314 training samples have been loadedscipy
(mine is 1.12.0 vs 1.9.1 inrequirements.txt
), all pip packages have the same version as the suppliedrequirements.txt
Logs
Here's the truncated logs of the training
The text was updated successfully, but these errors were encountered: