Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training RuntimeError: numel: integer multiplication overflow #31

Open
ecoArcGaming opened this issue Feb 5, 2025 · 1 comment
Open

Comments

@ecoArcGaming
Copy link

ecoArcGaming commented Feb 5, 2025

Hi, I encountered the following error when training r2-gaussian on a custom dataset.

    vol_pred, radii = voxelizer(
  File ".../miniconda3/envs/r2_gaussian/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File ".../miniconda3/envs/r2_gaussian/lib/python3.9/site-packages/xray_gaussian_rasterization_voxelization/voxelization.py", line 259, in forward
    return voxelize_gaussians(
  File ".../miniconda3/envs/r2_gaussian/lib/python3.9/site-packages/xray_gaussian_rasterization_voxelization/voxelization.py", line 49, in voxelize_gaussians
    return _VoxelizeGaussians.apply(
  File ".../miniconda3/envs/r2_gaussian/lib/python3.9/site-packages/xray_gaussian_rasterization_voxelization/voxelization.py", line 123, in forward
    ) = _C.voxelize_gaussians(*args)
RuntimeError: numel: integer multiplication overflow

This occurred at iteration 5000:
Train: 25%|██▌ | 5000/20000 [4:04:20<13:13:43, 3.17s/it, loss=1.9e+00, pts=5.6e+05]
I set densify_until_iter to 1000 iterations for this. The training speed kept decreasing even after iter 1000. I am using an A6000 GPU with 48GB of memory. The dataset I am using has a volume with dimension [500, 500, 500] and projections with dimension [768, 972].

Setting densify_until_iter to 0 allowed the training to complete but didn't produce any sensible results. How should I debug this?

@Ruyi-Zha
Copy link
Owner

Hi, I apologize for the late response as I was on vacation. I didn’t encounter this error during development, but it seems related to the voxelizer. I suggest disabling TV loss to prevent the voxelizer from being called during training. TV loss typically slightly improves performance but can slow down training. The Gaussian number looks fine, so densification likely isn’t the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants