-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new video reading API crash #5419
Comments
On kinetics dataset:
|
On kinetics dataset:
and
|
Documenting current debugging:
Note: Just a quick edit: |
@prabhat00155 do you still have this error? Could you share the version of |
I could reproduce the problem as follows:
Pytorch installs ffmpeg==4.2, while |
@v-iashin |
I think torch has changed the ffmpeg version that it installs with torchvision (4.2 -> 4.3). When I try to call I tried to install Reproduce:
|
It seems the error occurs at Suppose this makes sense as it is some sort of memory error. |
It is not consistent with my findings. Previously, when I compared ffmpeg=4.2 and 4.3.2, torch could load the same video into the memory and was failing with another version. I think the same error is being caused by two different reasons: a lack of RAM (your case) and a version mismatch (at least in my case). |
Hi all, what I can't seem to figure out is why that is happening. I've tried hi-res videos, didn't have an issue, but then a video from #6204 does. The codec looks the same as some other videos, and it passes the ffprobe without an issue. I've been getting some help from collegues at QS so hopefully will be able to get to the bottom of this. |
We've had to disable ffmpeg support at conda-forge. We can reliably recreate the a segfault that seems to occur during the video read tests. Curiously, it doesn't occur on python 3.9. This occurs for CPU builds too, not just GPU. Build logs can be followed conda-forge/torchvision-feedstock#60 |
🐛 Describe the bug
I get
malloc(): memory corruption
when running the following code with a video file.Video metadata:
On debugging, it points at this line as the culprit:
vision/torchvision/csrc/io/video/video.cpp
Line 314 in 0db67d8
Versions
Collecting environment information...
PyTorch version: 1.11.0.dev20220203+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.20.4
Libc version: glibc-2.27
Python version: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.4.0-1051-aws-x86_64-with-glibc2.17
Is CUDA available: False
CUDA runtime version: 11.1.105
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.5
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7.6.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.22.2
[pip3] torch==1.11.0.dev20220203+cu111
[pip3] torchvision==0.12.0a0+22f8dc4
[conda] numpy 1.22.2 pypi_0 pypi
[conda] torch 1.11.0.dev20220203+cu111 pypi_0 pypi
[conda] torchvision 0.12.0a0+22f8dc4 dev_0
The text was updated successfully, but these errors were encountered: