conda create -n freemask python==3.11
conda activate freemask
pip install -r requirements.txt
download DAVIS2016: https://davischallenge.org/davis2016/code.html In DAVIS2016, videos with only a single category segmentation map are selected, and 8 frames are chosen from each video for computation.
Prepare configs for all of your selected videos like this:
config/mask_bear.yaml
you need to change these settings for different videos:
dataset_config:
path: "dataset/frames/bear" #change to your video frame path
prompt: "a bear is walking" #change to your prompt
...
editing_config:
cal_maps: True #True for cross-attention visualization
dataname: "bear" #change to your video name
word: ["bear","bear"] #change to your edited object
...
editing_prompts: [
a bear is walking
] #change to your prompt
Run for cross-attention visualization
python cal_mask.py --config config/mask_bear.yaml
Then, cross-attention maps towards dataname
across all layers and all timesteps will be saved at ./camap/dataname
calculate the MIoU of all cross-attention maps with the ground-truth segmentation mask, then get the TMMC and LMMC according the the Eq.2-6 in the paper.
we provide an exmaple for MIoU calculation for one cross-attention map with the ground-truth segmentation mask:
python calculate_miou.py "dataset/miou_test/bear_mask.jpg" "dataset/miou_test/binarized_bear_camap.jpg"
After calculating the average MMC of all videos, you will get a codebook about MMC across timesteps and layers.
prepare a configs like:
config/giraffe_style.yaml
Run for style translation:
python run.py --config config/giraffe_style.yaml
prepare a configs like:
config/girl_jump_shape.yaml
Run for shape editing:
python run.py --config config/girl_jump_shape.yaml