Skip to content

Commit

Permalink
Merge pull request #41 from line/update_models
Browse files Browse the repository at this point in the history
Update results link
  • Loading branch information
awkrail authored Oct 21, 2024
2 parents 23a1c9f + 1a00463 commit 4650073
Show file tree
Hide file tree
Showing 4 changed files with 57 additions and 190 deletions.
27 changes: 9 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,6 @@ Furthermore, Lighthouse supports [audio moment retrieval](https://h-munakata.git
- [2024/09/25] Our work ["Language-based audio moment retrieval"](https://arxiv.org/abs/2409.15672) has been released. Lighthouse supports AMR.
- [2024/08/22] Our demo paper is available on arXiv. Any comments are welcome: [Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection](https://www.arxiv.org/abs/2408.02901).

## Milestones
We will release v1.0 until the end of September. Our plan includes:
- [x] : Reduce the configuration files (issue #19)
- [ ] : Update the trained weights and feature files on Google Drive and Zenodo
- [x] : Introduce PyTest for inference API (issue #21)
- [x] : Introduce Linter for inference API (issue #20)
- [x] : Introduce [audio moment retrieval (AMR)](https://h-munakata.github.io/Language-based-Audio-Moment-Retrieval/)

## Installation
Install ffmpeg first. If you are an Ubuntu user, run:
```
Expand All @@ -49,7 +41,7 @@ device = "cuda" if torch.cuda.is_available() else "cpu"

# slowfast_path is necesary if you use clip_slowfast features
query = 'A man is speaking in front of the camera'
model = CGDETRPredictor('results/clip_slowfast_cg_detr/qvhighlight/best.ckpt', device=device,
model = CGDETRPredictor('results/cg_detr/qvhighlight/clip_slowfast/best.ckpt', device=device,
feature_name='clip_slowfast', slowfast_path='SLOWFAST_8x8_R50.pkl')

# encode video features
Expand All @@ -74,7 +66,7 @@ pred_saliency_scores: [score, ...]
"""
```
Run `python api_example/demo.py` to reproduce the results. It automatically downloads pre-trained weights for CG-DETR (CLIP backbone).
If you want to use other models, download [pre-trained weights](https://drive.google.com/file/d/1ebQbhH1tjgTmRBmyOoW8J9DH7s80fqR9/view?usp=drive_link).
If you want to use other models, download [pre-trained weights](https://drive.google.com/file/d/1jxs_bvwttXTF9Lk3aKLohkqfYOonLyrO/view?usp=sharing).
When using `clip_slowfast` features, it is necessary to download [slowfast pre-trained weights](https://dl.fbaipublicfiles.com/pyslowfast/model_zoo/kinetics400/SLOWFAST_8x8_R50.pkl).
When using `clip_slowfast_pann` features, in addition to the slowfast weight, download [panns weights](https://zenodo.org/record/3987831/files/Cnn14_mAP%3D0.431.pth).

Expand Down Expand Up @@ -113,7 +105,7 @@ Highlight detection
- [x] : [YouTube Highlights (Sun et al. ECCV14)](https://grail.cs.washington.edu/wp-content/uploads/2015/08/sun2014rdh.pdf)

Audio moment retrieval
- [x] : [Clotho moment (Munakata et al. arXiv24)](https://h-munakata.github.io/Language-based-Audio-Moment-Retrieval/)
- [x] : [Clotho Moment/TUT2017/UnAV100-subset (Munakata et al. arXiv24)](https://h-munakata.github.io/Language-based-Audio-Moment-Retrieval/)

### Features
- [x] : ResNet+GloVe
Expand All @@ -125,22 +117,24 @@ Audio moment retrieval
## Reproduce the experiments

### Pre-trained weights
Pre-trained weights can be downloaded from [here](https://drive.google.com/file/d/1ebQbhH1tjgTmRBmyOoW8J9DH7s80fqR9/view?usp=drive_link).
Download and unzip on the home directory. If you want individual weights, download from [reproduced results tables](#reproduced-results).
Pre-trained weights can be downloaded from [here](https://drive.google.com/file/d/1jxs_bvwttXTF9Lk3aKLohkqfYOonLyrO/view?usp=sharing).
Download and unzip on the home directory.

### Datasets
Due to the copyright issue, we here distribute only feature files.
Download and place them under `./features` directory.
To extract features from videos, we use [HERO_Video_Feature_Extractor](https://github.com/linjieli222/HERO_Video_Feature_Extractor).
Note that Clotho-moment is used for [AMR](https://h-munakata.github.io/Language-based-Audio-Moment-Retrieval/).

- [QVHighlights](https://drive.google.com/file/d/1-ALnsXkA4csKh71sRndMwybxEDqa-dM4/view?usp=sharing)
- [Charades-STA](https://drive.google.com/file/d/1EOeP2A4IMYdotbTlTqDbv5VdvEAgQJl8/view?usp=sharing)
- [ActivityNet Captions](https://drive.google.com/file/d/1P2xS998XfbN5nSDeJLBF1m9AaVhipBva/view?usp=sharing)
- [TACoS](https://drive.google.com/file/d/1rYzme9JNAk3niH1K81wgT13pOMn005jb/view?usp=sharing)
- [TVSum](https://drive.google.com/file/d/1gSex1hpXLxHQu6zHyyQISKZjP7Ndt6U9/view?usp=sharing)
- [YouTube Highlight](https://drive.google.com/file/d/12swoymGwuN5TlDlWBTo6UUWVm2DqVBpn/view?usp=sharing)
- [Clotho Moment](https://zenodo.org/records/13806234)

For [AMR](https://h-munakata.github.io/Language-based-Audio-Moment-Retrieval/), download features from here.

- [Clotho Moment/TUT2017/UnAV100-subset](https://zenodo.org/records/13806234)

The whole directory should be look like this:
```
Expand Down Expand Up @@ -243,9 +237,6 @@ Then zip `hl_val_submission.jsonl` and `hl_test_submission.jsonl`, and submit it
zip -r submission.zip val_submission.jsonl test_submission.jsonl
```

## Reproduced results
See [here](markdown/reproduced_results.md). You can download individual checkpoints.

## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Expand Down
47 changes: 47 additions & 0 deletions api_example/demo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
"""
Copyright $today.year LY Corporation
LY Corporation licenses this file to you under the Apache License,
version 2.0 (the "License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at:
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License.
"""
import os
import subprocess
import torch

from lighthouse.models import CGDETRPredictor
from typing import Dict, List, Optional

def load_weights(weight_dir: str) -> None:
if not os.path.exists(os.path.join(weight_dir, 'clip_slowfast_pann_cg_detr_qvhighlight.ckpt')):
command = 'wget -P gradio_demo/weights/ https://zenodo.org/records/13960580/files/clip_slowfast_pann_cg_detr_qvhighlight.ckpt'
subprocess.run(command, shell=True)

if not os.path.exists('SLOWFAST_8x8_R50.pkl'):
subprocess.run('wget https://dl.fbaipublicfiles.com/pyslowfast/model_zoo/kinetics400/SLOWFAST_8x8_R50.pkl', shell=True)

if not os.path.exists('Cnn14_mAP=0.431.pth'):
subprocess.run('wget https://zenodo.org/record/3987831/files/Cnn14_mAP%3D0.431.pth', shell=True)

# use GPU if available
device: str = 'cpu'
weight_dir: str = 'gradio_demo/weights'
weight_path: str = os.path.join(weight_dir, 'clip_slowfast_cg_detr_qvhighlight.ckpt')
model: CGDETRPredictor = CGDETRPredictor(weight_path, device=device, feature_name='clip_slowfast',
slowfast_path='SLOWFAST_8x8_R50.pkl', pann_path=None)

# encode video features
model.encode_video('api_example/RoripwjYFp8_60.0_210.0.mp4')

# moment retrieval & highlight detection
query: str = 'A woman wearing a glass is speaking in front of the camera'
prediction: Optional[Dict[str, List[float]]] = model.predict(query)
print(prediction)
2 changes: 1 addition & 1 deletion gradio_demo/demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ def load_pretrained_weights():
for model_name in MODEL_NAMES:
for feature in FEATURES:
file_urls.append(
"https://zenodo.org/records/13639198/files/{}_{}_qvhighlight.ckpt".format(feature, model_name)
"https://zenodo.org/records/13960580/files/{}_{}_qvhighlight.ckpt".format(feature, model_name)
)
for file_url in tqdm(file_urls):
if not os.path.exists('gradio_demo/weights/' + os.path.basename(file_url)):
Expand Down
Loading

0 comments on commit 4650073

Please sign in to comment.