Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data pre-processing Soccernet, missing Cache files? #7

Closed
Liamsalass opened this issue Oct 7, 2024 · 9 comments · Fixed by #10
Closed

Data pre-processing Soccernet, missing Cache files? #7

Liamsalass opened this issue Oct 7, 2024 · 9 comments · Fixed by #10
Assignees
Labels
question Further information is requested

Comments

@Liamsalass
Copy link
Contributor

I have pre-processed the SoccerNetV2 dataset using the extract frames pre-processing script.

Now I'm trying to pre-compute the labels but I'm missing the cache files and the num_timestamps_per_half.pt file.

Is there a certain place I can find this file on the Soccernet dataset website or is this something I should generate myself?

Thank you for the help.

@Liamsalass Liamsalass changed the title Data pre-processing missing Cache files? Data pre-processing Soccernet, missing Cache files? Oct 7, 2024
@juliendenize juliendenize added the question Further information is requested label Oct 9, 2024
@juliendenize juliendenize self-assigned this Oct 9, 2024
@juliendenize
Copy link
Owner

Hi,

If possible could you provide the commands you ran, the one that fails and the error please ?

It's been a while now I've worked on this code so I need some help to refresh my memory ^^'

@Liamsalass
Copy link
Contributor Author

Hi,

Thank you for the prompt response!

I've attached two scripts that I've run. The first was completed successfully and was for extracting the frames. The second is where I'm getting stuck, which is with the preprocessing labels.

Extract Frames:

#!/bin/bash

fps=2
input_folder="/SATA/SoccerNet/"
output_folder="/SATA/soccernet_as_extracted_${fps}fps/"
split=train

export PYTHONPATH=$PYTHONPATH:/home/liam/soccer/action_spotting/eztorch

python run/datasets/extract_soccernet.py \
    --input-folder $input_folder \
    --output-folder $output_folder \
    --fps $fps \
    --split $split

split=test

python run/datasets/extract_soccernet.py \
    --input-folder $input_folder \
    --output-folder $output_folder \
    --fps $fps \
    --split $split

Precompute Labels

#!/bin/bash

radius_label=0.5
dataset_json=/SATA/soccernet_as_extracted_2fps/test.json # Path to the JSON.
frame_dir=/SATA/soccernet_as_extracted_2fps/test/ # Path to the decoded videos.
fps=2
# not working, where do I get this file and what is it for?
# Is this something I should download from SoccerNet? If so, where do I get it?
cache_dir=/SATA/soccernet_as_extracted_2fps/cache/ 

export PYTHONPATH=$PYTHONPATH:/home/liam/soccer/action_spotting/eztorch

python run/datasets/precompute_soccernet_labels.py \
    --radius-label $radius_label \
    --data-path $dataset_json \
    --path-prefix $frame_dir \
    --fps $fps 
    #--cache-dir $cache_dir

And here is the output I get from running the precompute labels:

(action_spotting) liam@pat:~/soccer/action_spotting/eztorch$ ./precompute_labels.sh 
/home/liam/soccer/action_spotting/eztorch/eztorch/datasets/soccernet.py:420: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  self.num_timestamps_per_half = torch.load(
Traceback (most recent call last):
  File "/home/liam/soccer/action_spotting/eztorch/run/datasets/precompute_soccernet_labels.py", line 33, in <module>
    main()
  File "/home/liam/soccer/action_spotting/eztorch/run/datasets/precompute_soccernet_labels.py", line 18, in main
    soccernet_dataset(
  File "/home/liam/soccer/action_spotting/eztorch/eztorch/datasets/soccernet.py", line 667, in soccernet_dataset
    dataset = SoccerNet(
              ^^^^^^^^^^
  File "/home/liam/soccer/action_spotting/eztorch/eztorch/datasets/soccernet.py", line 72, in __init__
    self._precompute_labels()
  File "/home/liam/soccer/action_spotting/eztorch/eztorch/datasets/soccernet.py", line 420, in _precompute_labels
    self.num_timestamps_per_half = torch.load(
                                   ^^^^^^^^^^^
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/torch/serialization.py", line 1065, in load
    with _open_file_like(f, 'rb') as opened_file:
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/torch/serialization.py", line 468, in _open_file_like
    return _open_file(name_or_buffer, mode)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/torch/serialization.py", line 449, in __init__
    super().__init__(open(name, mode))
                     ^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'num_timestamps_per_half.pt'

Let me know if there is anything else I can do to help, and thank you so much for the help :)

@Liamsalass
Copy link
Contributor Author

Hi there,

Looking through the code, I found that if left empty, the Cache directory is created. The check for if the cache argument is None fails always because it defaults to an empty string, so it's never a None type. I added a check for an empty string. and now the code continues, but I've run into a new error with regards to how labels are dealt with:

(action_spotting) liam@pat:~/soccer/action_spotting/eztorch$ ./precompute_labels.sh 
Traceback (most recent call last):
  File "/home/liam/soccer/action_spotting/eztorch/run/datasets/precompute_soccernet_labels.py", line 33, in <module>
    main()
  File "/home/liam/soccer/action_spotting/eztorch/run/datasets/precompute_soccernet_labels.py", line 18, in main
    soccernet_dataset(
  File "/home/liam/soccer/action_spotting/eztorch/eztorch/datasets/soccernet.py", line 668, in soccernet_dataset
    dataset = SoccerNet(
              ^^^^^^^^^^
  File "/home/liam/soccer/action_spotting/eztorch/eztorch/datasets/soccernet.py", line 72, in __init__
    self._precompute_labels()
  File "/home/liam/soccer/action_spotting/eztorch/eztorch/datasets/soccernet.py", line 492, in _precompute_labels
    (labels,) = _get_labels(
    ^^^^^^^^^
ValueError: too many values to unpack (expected 1)

I've made another change moving the return labels.permute(1, 0) line in soccernet.py one tab space back to get the ./extract_frames file to run, which resolved the same issue I was having earlier. But now, this issue is occurring when I'm precomputing the labels. If you could elaborate on what exactly is being done in precompute_labels that would be very helpful.

Thanks for the help!

@juliendenize
Copy link
Owner

From my code which is fairly poorly written, I understand that cache_dir should be provided AND should not exist, otherwise it will try to load data. So it should be a path to a directory not existing not to a file.

When you provided the cache_dir argument did you create previously the cache_dir ? if not, then there is an issue that I don't grasp. Maybe there is not a write access, but it seems unlikely if you put the cache close to the data.

The precomputing of labels consist on creating a tensor for each frame to know if one or multiple action occurs so the tensor for each video is of shape:
(Number of frames, Number of actions)

@juliendenize
Copy link
Owner

I made a PR #10 could you try to go onto this branch ? I cannot test right now because I don't have the data on my hands and I hope I won't need to.

I've overriden some changes from your PR #9. I was quite concerned on why you changed the np types from string to bytes for the paths and the .gitignore local changes should be kept local I think 😄

@Liamsalass
Copy link
Contributor Author

Hi there,

Ya, sorry about that; the changes going from bytes to strings were because I was getting internal Python errors. The error was with regards to np.string being deprecated by numpy >= 2.0. Downgrading numpy however, resulted in more cascading errors. Looking at the documentation, I believe the functionality remained unchanged, so I included it.

Thank you for making the PR #10, I'll give it a try and get back to you.

@Liamsalass
Copy link
Contributor Author

Hi there,

I'm running into issues with the inference now. Mainly, not being sure which config files to use for the script:

output_dir=...
test_dir=...
frame_dir=...
labels_cache_dir_test=... # Where test model labels are cached
soccernet_labels_dir=... # Directory of ground truth labels.
checkpoint_path=...

srun --kill-on-bad-exit=1 python test.py -cp $config_path -cn $config_name \
    dir.data=$test_dir \
    dir.root=$output_dir \
    dir.exp="test/" \
    seed.seed=$seed \
    datamodule.train=null \
    datamodule.val=null \
    datamodule.test.dataset.task=action \
    datamodule.test.dataset.datadir=$test_dir \
    datamodule.test.dataset.video_path_prefix=$frame_dir \
    datamodule.test.dataset.label_args.cache_dir=$labels_cache_dir_test \
    datamodule.test.dataset.label_args.radius_label=0.5 \
    datamodule.test.loader.num_workers=4 \
    datamodule.test.global_batch_size=64 \
    model.optimizer.batch_size=2 \
    model.evaluation_args.SoccerNet_path=$soccernet_labels_dir \
    model.evaluation_args.split="test" \
    model.trunk.transformer.temporal_depth=6 \
    model.save_test_preds_path="test_preds/" \
    model.prediction_args.remove_inference_prediction_seconds=12 \
    model.prediction_args.merge_predictions_type="max" \
    model.NMS_args.nms_type=soft \
    model.NMS_args.window=20 \
    model.NMS_args.threshold=0.001 \
    model.train_transform=null \
    model.val_transform=null \
    model.pretrained_path=$checkpoint_path \
    ++test.ckpt_path=null

The default files are not provided in the repository. I've tried a few different config entry point files. The current one I'm trying to work with is the eztorch/eztorch/configs/run/finetuning/viswin/ and viswin_tiny_soccernet_uniform.yaml config files but I get the following error:

/home/liam/soccer/action_spotting/eztorch/run/test.py:19: UserWarning: 
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(
/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
Run directory: /SATA/SoccerNet/COMEDIAN/output/test
Seed set to 42
Decoder default transform: None
Decoder test transform: None
Error executing job with overrides: ['dir.data=/SATA/SoccerNet/test/', 'dir.root=/SATA/SoccerNet/COMEDIAN/output/', 'dir.exp=test/', 'seed.seed=42', 'datamodule.train=null', 'datamodule.val=null', 'datamodule.test.dataset.task=action', 'datamodule.test.dataset.datadir=/SATA/SoccerNet/test/', 'datamodule.test.dataset.video_path_prefix=/SATA/soccernet_as_extracted_2fps/test/', 'datamodule.test.dataset.label_args.cache_dir=/SATA/soccernet_as_extracted_2fps/test/cache/', 'datamodule.test.dataset.label_args.radius_label=0.5', 'datamodule.test.loader.num_workers=4', 'datamodule.test.global_batch_size=12', 'model.optimizer.batch_size=2', 'model.evaluation_args.SoccerNet_path=/SATA/soccernet_as_extracted_2fps/test.json/', 'model.evaluation_args.split=test', 'model.trunk.transformer.temporal_depth=6', 'model.save_test_preds_path=test_preds/', 'model.prediction_args.remove_inference_prediction_seconds=12', 'model.prediction_args.merge_predictions_type=max', 'model.NMS_args.nms_type=soft', 'model.NMS_args.window=20', 'model.NMS_args.threshold=0.001', 'model.train_transform=null', 'model.val_transform=null', 'model.pretrained_path=/home/liam/soccer/action_spotting/ckpts/comedian_viswin_tiny_seed42.pth', '++test.ckpt_path=null']
Traceback (most recent call last):
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/utils.py", line 644, in _locate
    obj = getattr(obj, part)
          ^^^^^^^^^^^^^^^^^^
AttributeError: module 'eztorch.models.trunks' has no attribute 'create_viswin_tiny'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/utils.py", line 650, in _locate
    obj = import_module(mod)
          ^^^^^^^^^^^^^^^^^^
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1324, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'eztorch.models.trunks.create_viswin_tiny'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 134, in _resolve_target
    target = _locate(target)
             ^^^^^^^^^^^^^^^
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/utils.py", line 653, in _locate
    raise ImportError(
ImportError: Error loading 'eztorch.models.trunks.create_viswin_tiny':
ModuleNotFoundError("No module named 'eztorch.models.trunks.create_viswin_tiny'")
Are you sure that 'create_viswin_tiny' is importable from module 'eztorch.models.trunks'?

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 92, in _call_target
    return _target_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/liam/soccer/action_spotting/eztorch/eztorch/models/trunks/transformer_token_handler.py", line 209, in create_vitransformer_token_handler_model
    transformer = hydra.utils.instantiate(transformer)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 226, in instantiate
    return instantiate_node(
           ^^^^^^^^^^^^^^^^^
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 333, in instantiate_node
    _target_ = _resolve_target(node.get(_Keys.TARGET), full_key)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 139, in _resolve_target
    raise InstantiationException(msg) from e
hydra.errors.InstantiationException: Error locating target 'eztorch.models.trunks.create_viswin_tiny', set env var HYDRA_FULL_ERROR=1 to see chained exception.
full_key: transformer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 92, in _call_target
    return _target_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/liam/soccer/action_spotting/eztorch/eztorch/models/soccernet_spotting.py", line 87, in __init__
    self.trunk: nn.Module = hydra.utils.instantiate(trunk)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 226, in instantiate
    return instantiate_node(
           ^^^^^^^^^^^^^^^^^
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 347, in instantiate_node
    return _call_target(_target_, partial, args, kwargs, full_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 97, in _call_target
    raise InstantiationException(msg) from e
hydra.errors.InstantiationException: Error in call to target 'eztorch.models.trunks.transformer_token_handler.create_vitransformer_token_handler_model':
InstantiationException("Error locating target 'eztorch.models.trunks.create_viswin_tiny', set env var HYDRA_FULL_ERROR=1 to see chained exception.\nfull_key: transformer")
full_key: trunk

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/liam/soccer/action_spotting/eztorch/run/test.py", line 91, in <module>
    main()
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/main.py", line 94, in decorated_main
    _run_hydra(
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/utils.py", line 457, in _run_app
    run_and_report(
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
           ^^^^^^
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
    lambda: hydra.run(
            ^^^^^^^^^^
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
        ^^^^^^^^^^^^^^^^
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/liam/soccer/action_spotting/eztorch/run/test.py", line 55, in main
    model: LightningModule = hydra.utils.instantiate(config.model)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 226, in instantiate
    return instantiate_node(
           ^^^^^^^^^^^^^^^^^
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 347, in instantiate_node
    return _call_target(_target_, partial, args, kwargs, full_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/liam/miniconda3/envs/action_spotting/lib/python3.12/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 97, in _call_target
    raise InstantiationException(msg) from e
hydra.errors.InstantiationException: Error in call to target 'eztorch.models.soccernet_spotting.SoccerNetSpottingModel':
InstantiationException('Error in call to target \'eztorch.models.trunks.transformer_token_handler.create_vitransformer_token_handler_model\':\nInstantiationException("Error locating target \'eztorch.models.trunks.create_viswin_tiny\', set env var HYDRA_FULL_ERROR=1 to see chained exception.\\nfull_key: transformer")\nfull_key: trunk')
full_key: model

This makes me think this isn't the correct file, either.

Sorry to keep bugging you about this project, I really appreciate all the help though. I'm new to the Hyrda library, so I might just be using it completely wrong with this setup.

@juliendenize
Copy link
Owner

juliendenize commented Oct 10, 2024

Using hydra, the config path is relative to the script launched using Python

I gave this exemple for vivit:

config_path="../eztorch/configs/run/finetuning/vivit"
config_name="vivit_tiny_soccernet_uniform"

I guess the config name you use is correct and path too otherwise it wouldnt launch the code

Did you install eztorch in your environement ? If yes, could you verify timm is installed ?

@Liamsalass
Copy link
Contributor Author

Hi,

Thanks for the help; I got it working by downgrading the timm version. All of my libraries are of different versions than what the code was built upon. I'm still determining how that happened since I downloaded them using the requirement.txt file.

I think I ran into an issue when downloading dependencies and manually downloaded some, which resulted in the mismatches.

Again thank you so much for the help, I greatly appreciated it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants