Merge pull request #41 from line/update_models

Update results link
line · Oct 21, 2024 · 4650073 · 4650073
2 parents 23a1c9f + 1a00463
commit 4650073
Show file tree

Hide file tree

Showing 4 changed files with 57 additions and 190 deletions.
diff --git a/README.md b/README.md
@@ -15,14 +15,6 @@ Furthermore, Lighthouse supports [audio moment retrieval](https://h-munakata.git
 - [2024/09/25] Our work ["Language-based audio moment retrieval"](https://arxiv.org/abs/2409.15672) has been released. Lighthouse supports AMR.
 - [2024/08/22] Our demo paper is available on arXiv. Any comments are welcome: [Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection](https://www.arxiv.org/abs/2408.02901).
 
-## Milestones
-We will release v1.0 until the end of September. Our plan includes:
-- [x] : Reduce the configuration files (issue #19)
-- [ ] : Update the trained weights and feature files on Google Drive and Zenodo
-- [x] : Introduce PyTest for inference API (issue #21)
-- [x] : Introduce Linter for inference API (issue #20)
-- [x] : Introduce [audio moment retrieval (AMR)](https://h-munakata.github.io/Language-based-Audio-Moment-Retrieval/)
-
 ## Installation
 Install ffmpeg first. If you are an Ubuntu user, run:
 ```
@@ -49,7 +41,7 @@ device = "cuda" if torch.cuda.is_available() else "cpu"
 
 # slowfast_path is necesary if you use clip_slowfast features
 query = 'A man is speaking in front of the camera'
-model = CGDETRPredictor('results/clip_slowfast_cg_detr/qvhighlight/best.ckpt', device=device,
+model = CGDETRPredictor('results/cg_detr/qvhighlight/clip_slowfast/best.ckpt', device=device,
                         feature_name='clip_slowfast', slowfast_path='SLOWFAST_8x8_R50.pkl')
 
 # encode video features
@@ -74,7 +66,7 @@ pred_saliency_scores: [score, ...]
 """
 ```
 Run `python api_example/demo.py` to reproduce the results. It automatically downloads pre-trained weights for CG-DETR (CLIP backbone).
-If you want to use other models, download [pre-trained weights](https://drive.google.com/file/d/1ebQbhH1tjgTmRBmyOoW8J9DH7s80fqR9/view?usp=drive_link). 
+If you want to use other models, download [pre-trained weights](https://drive.google.com/file/d/1jxs_bvwttXTF9Lk3aKLohkqfYOonLyrO/view?usp=sharing). 
 When using `clip_slowfast` features, it is necessary to download [slowfast pre-trained weights](https://dl.fbaipublicfiles.com/pyslowfast/model_zoo/kinetics400/SLOWFAST_8x8_R50.pkl).
 When using `clip_slowfast_pann` features, in addition to the slowfast weight, download [panns weights](https://zenodo.org/record/3987831/files/Cnn14_mAP%3D0.431.pth).
 
@@ -113,7 +105,7 @@ Highlight detection
 - [x] : [YouTube Highlights (Sun et al. ECCV14)](https://grail.cs.washington.edu/wp-content/uploads/2015/08/sun2014rdh.pdf)
 
 Audio moment retrieval
-- [x] : [Clotho moment (Munakata et al. arXiv24)](https://h-munakata.github.io/Language-based-Audio-Moment-Retrieval/)
+- [x] : [Clotho Moment/TUT2017/UnAV100-subset (Munakata et al. arXiv24)](https://h-munakata.github.io/Language-based-Audio-Moment-Retrieval/)
 
 ### Features
 - [x] : ResNet+GloVe
@@ -125,22 +117,24 @@ Audio moment retrieval
 ## Reproduce the experiments
 
 ### Pre-trained weights
-Pre-trained weights can be downloaded from [here](https://drive.google.com/file/d/1ebQbhH1tjgTmRBmyOoW8J9DH7s80fqR9/view?usp=drive_link).
-Download and unzip on the home directory. If you want individual weights, download from [reproduced results tables](#reproduced-results).
+Pre-trained weights can be downloaded from [here](https://drive.google.com/file/d/1jxs_bvwttXTF9Lk3aKLohkqfYOonLyrO/view?usp=sharing).
+Download and unzip on the home directory.
 
 ### Datasets
 Due to the copyright issue, we here distribute only feature files.
 Download and place them under `./features` directory.
 To extract features from videos, we use [HERO_Video_Feature_Extractor](https://github.com/linjieli222/HERO_Video_Feature_Extractor).
-Note that Clotho-moment is used for [AMR](https://h-munakata.github.io/Language-based-Audio-Moment-Retrieval/).
 
 - [QVHighlights](https://drive.google.com/file/d/1-ALnsXkA4csKh71sRndMwybxEDqa-dM4/view?usp=sharing)
 - [Charades-STA](https://drive.google.com/file/d/1EOeP2A4IMYdotbTlTqDbv5VdvEAgQJl8/view?usp=sharing)
 - [ActivityNet Captions](https://drive.google.com/file/d/1P2xS998XfbN5nSDeJLBF1m9AaVhipBva/view?usp=sharing)
 - [TACoS](https://drive.google.com/file/d/1rYzme9JNAk3niH1K81wgT13pOMn005jb/view?usp=sharing)
 - [TVSum](https://drive.google.com/file/d/1gSex1hpXLxHQu6zHyyQISKZjP7Ndt6U9/view?usp=sharing)
 - [YouTube Highlight](https://drive.google.com/file/d/12swoymGwuN5TlDlWBTo6UUWVm2DqVBpn/view?usp=sharing)
-- [Clotho Moment](https://zenodo.org/records/13806234)
+
+For [AMR](https://h-munakata.github.io/Language-based-Audio-Moment-Retrieval/), download features from here.
+
+- [Clotho Moment/TUT2017/UnAV100-subset](https://zenodo.org/records/13806234)
 
 The whole directory should be look like this:
 ```
@@ -243,9 +237,6 @@ Then zip `hl_val_submission.jsonl` and `hl_test_submission.jsonl`, and submit it
 zip -r submission.zip val_submission.jsonl test_submission.jsonl
 ```
 
-## Reproduced results
-See [here](markdown/reproduced_results.md). You can download individual checkpoints.
-
 ## Contributing
 Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
 

diff --git a/api_example/demo.py b/api_example/demo.py
@@ -0,0 +1,47 @@
+"""
+Copyright $today.year LY Corporation
+
+LY Corporation licenses this file to you under the Apache License,
+version 2.0 (the "License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at:
+
+  https://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+License for the specific language governing permissions and limitations
+under the License.
+"""
+import os
+import subprocess
+import torch
+
+from lighthouse.models import CGDETRPredictor
+from typing import Dict, List, Optional
+
+def load_weights(weight_dir: str) -> None:
+    if not os.path.exists(os.path.join(weight_dir, 'clip_slowfast_pann_cg_detr_qvhighlight.ckpt')):  
+        command = 'wget -P gradio_demo/weights/ https://zenodo.org/records/13960580/files/clip_slowfast_pann_cg_detr_qvhighlight.ckpt'
+        subprocess.run(command, shell=True)
+
+    if not os.path.exists('SLOWFAST_8x8_R50.pkl'):
+        subprocess.run('wget https://dl.fbaipublicfiles.com/pyslowfast/model_zoo/kinetics400/SLOWFAST_8x8_R50.pkl', shell=True)
+
+    if not os.path.exists('Cnn14_mAP=0.431.pth'):
+        subprocess.run('wget https://zenodo.org/record/3987831/files/Cnn14_mAP%3D0.431.pth', shell=True)
+
+# use GPU if available
+device: str = 'cpu'
+weight_dir: str = 'gradio_demo/weights'
+weight_path: str = os.path.join(weight_dir, 'clip_slowfast_cg_detr_qvhighlight.ckpt')
+model: CGDETRPredictor = CGDETRPredictor(weight_path, device=device, feature_name='clip_slowfast', 
+                                         slowfast_path='SLOWFAST_8x8_R50.pkl', pann_path=None)
+
+# encode video features
+model.encode_video('api_example/RoripwjYFp8_60.0_210.0.mp4')
+
+# moment retrieval & highlight detection
+query: str = 'A woman wearing a glass is speaking in front of the camera'
+prediction: Optional[Dict[str, List[float]]] = model.predict(query)
+print(prediction)
diff --git a/gradio_demo/demo.py b/gradio_demo/demo.py
@@ -37,7 +37,7 @@ def load_pretrained_weights():
     for model_name in MODEL_NAMES:
         for feature in FEATURES:
             file_urls.append(
-                "https://zenodo.org/records/13639198/files/{}_{}_qvhighlight.ckpt".format(feature, model_name)
+                "https://zenodo.org/records/13960580/files/{}_{}_qvhighlight.ckpt".format(feature, model_name)
             )
     for file_url in tqdm(file_urls):
         if not os.path.exists('gradio_demo/weights/' + os.path.basename(file_url)):