Welcome to the "Deep Video Extraction" repository! This repository is designed to extract deep feature representations from video inputs using pre-trained models, such as VGG19 and ResNet, among others. In addition to processing videos, this repository also enables the extraction of deep features from audio by converting them into spectrograms. Furthermore, it provides the capability to extract audio from videos and spectograms.
The core functionality of this repository lies in its ability to iterate through a directory containing videos, applying deep feature extraction using state-of-the-art pre-trained models. Specifically, the renowned VGG19 model is employed as the primary deep feature extractor. The codebase is built using the popular deep learning framework, PyTorch, ensuring robustness and ease of use.
By leveraging the power of pre-trained models, this repository enables you to extract high-level representations from video inputs. These deep feature representations capture meaningful visual patterns, aiding in various tasks such as object recognition, video summarization, and content-based retrieval.
Moreover, the repository offers an additional dimension of processing by extracting deep features from audio. By converting audio signals into spectrograms, you can analyze the temporal and frequency components, unlocking valuable insights for audio-related tasks such as speech recognition, music classification, and audio event detection.
The repository also provides functionalities to extract the audio track from videos and the corresponding spectrograms. This empowers users to explore the audio content independently and perform tasks such as audio synthesis, audio captioning, or combining audio and visual modalities for multimodal analysis.
Overall, "Deep Video Extraction" serves as a comprehensive toolbox for deep feature extraction, catering to both video and audio inputs. By harnessing pre-trained models and leveraging PyTorch's flexibility, this repository empowers researchers, developers, and practitioners to unlock the rich information embedded in videos and audio signals, facilitating a wide range of applications in computer vision, multimedia analysis, and beyond.
git clone https://github.com/theopsall/deep_video_extraction.git
cd deep_video_extraction
pip install -r requirements.txt
To extract only the visual deep video features run the following command:
python deep_feature_extraction extractVisual -i <input_directory> [-m <model>] [-l <layers>] [-f] [-s] [-o <output_directory>]
-i
, --input
: Input directory with videos (required)
-m
, --model
: The pretrained model (default: "vgg")
-l
, --layers
: Number of layers to exclude from the pretrained model (default: -1)
-f
, --flatten
: Flatten the last layer of the feature vector
-s
, --store
: Store feature vectors
-o
, --output
: Output directory
To extract only the aural deep video features run the following command:
python deep_feature_extraction extractAural -i <input_directory> [-m <model>] [-l <layers>] [-f] [-s] [-o <output_directory>]
-i
, --input
: Input directory with audios (required)
-m
, --model
: The pretrained model (default: "resnet")
-l
, --layers
: Number of layers to exclude from the pretrained model (default: -1)
-f
, --flatten
: Flatten the last layer of the feature vector
-s
, --store
: Store feature vectors
-o
, --output
: Output directory
python deep_feature_extraction soundIsolation [-i <input_directory>] [-o <output_directory>]
-i
, --input
: Input directory with videos
-o
, --output
: Output directory
python deep_feature_extraction spectro [-i <input_directory>] [-o <output_directory>]
-i
, --input
: Input directory with audio files
-o
, --output
: Output directory