Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates (CVPR 2024)
Official Github repository for paper:
Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates
Ka Chun Shum1,
Jaeyeon Kim1,
Binh-Son Hua2,
Duc Thanh Nguyen3,
Sai-Kit Yeung1
1Hong Kong University of Science and Technology 2Trinity College Dublin 3Deakin University
Our dataset comprises 10 sets of multi-view object images and 8 sets of multi-view background images. All photos were taken with an iPhone and feature everyday backgrounds or objects.
Below is the visualization of some objects and backgrounds:
Our dataset can be downloaded here.
Unzip it and place the dataset folder as pose-conditioned-NeRF-object-fusion/dataset
.
You may follow instant-ngp to construct your data.
Clone the code and build a virtual environment for it:
git clone https://github.com/kcshum/pose-conditioned-NeRF-object-fusion.git
cd pose-conditioned-NeRF-object-fusion
conda create -n posefusion python=3.9
conda activate posefusion
We use Paddle implementation for diffusion model fine-tuning:
conda install -c conda-forge cudatoolkit=11.6 cudnn=8.4.1.50 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
pip install paddlepaddle-gpu==2.4.2.post116 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
pip install paddlenlp==2.5.2 ppdiffusers==0.14.0
You are recommended to test whether Paddle is successfully installed by running in Python:
import paddle paddle.utils.run_check()
If cudnn cannot be detected, run the followings and try again:
conda env config vars set LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/{change_to_your_dir}/anaconda3/envs/posefusion/include:/{change_to_your_dir}/anaconda3/envs/posefusion/lib conda deactivate conda activate posefusion
We use Pytorch for NeRF optimization:
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip install opencv-python kornia configargparse
We aim to insert an object (represented by a set of multi-view object images) into a background NeRF (learned from a set of multi-view background images).
Below are the video visualization results of some edited NeRFs, where the red boxes refer to the target location:
We provide in configs/commands.txt
the configuration of various edits for you to try and modify. One example is described below.
Fine-tune a diffusion model in inpainting manner on both the object and background images with customized text prompt:
python -u train_inpainting_dreambooth.py \
--pretrained_model_name_or_path="stabilityai/stable-diffusion-2-inpainting" \
--object_data="model_car" --background_data="wooden_table" \
--object_prompt="sks white model car" --background_prompt="pqp wooden table" \
--max_train_steps_OBJ=4000 --max_train_steps_BG=400
The command is self-explanatory. You may check the available arguments in the code.
The fine-tuned diffusion model is by default saved in dream_outputs/{--object_data}_and_{--background_data}
.
Optimize a background NeRF and then insert the object:
python train_nerf_fusion.py \
--config configs/nerf_fusion.txt --datadir "dataset/background/wooden_table" \
--finetuned_model_path "dream_outputs/model_car_and_wooden_table" \
--prompt "sks white model car on pqp wooden table" \
--pivot_name "IMG_4853.png" --box_name wooden_table_02 \
--strength_lower_bound 35 --strength_higher_bound 35
The command is self-explanatory. You may check the available arguments in the code.
--pivot_name
is the first view to train. --box_name
is the bounding box to use.
--strength_lower_bound
and --strength_higher_bound
refer to the range of diffusion model noise strength used in inferencing for view refinement,
here we keep it fixed as 35 (note: 0 = no noise; 100 = pure noise). You may try random noise by specifying --strength_lower_bound 10 --strength_higher_bound 90
.
It is a prerequisite to provide a nice first object-blended view before updating the new nearby views.
The updating dataset is visualized in logs/{your_experiment}/visualization
.
The object bounding box is visualized in logs/{your_experiment}/boundingbox
.
The NeRF renderings are periodically visualized in logs/{your_experiment}/{epoch}_{training stage}
.
The NeRF model is periodically saved as logs/{your_experiment}/{epoch}.tar
.
After the training ends, you may reuse the previous command and specifying extra arguments as follows.
You may render all training views by running:
python train_nerf_fusion.py \
--config configs/nerf_fusion.txt --datadir "dataset/background/wooden_table" \
--finetuned_model_path "dream_outputs/model_car_and_wooden_table" \
--prompt "sks white model car on pqp wooden table" \
--pivot_name "IMG_4853.png" --box_name wooden_table_02 \
--strength_lower_bound 35 --strength_higher_bound 35 \
--render_image --ckpt_epoch_to_load 40000
You may render a video by running:
python train_nerf_fusion.py \
--config configs/nerf_fusion.txt --datadir "dataset/background/wooden_table" \
--finetuned_model_path "dream_outputs/model_car_and_wooden_table" \
--prompt "sks white model car on pqp wooden table" \
--pivot_name "IMG_4853.png" --box_name wooden_table_02 \
--strength_lower_bound 35 --strength_higher_bound 35 \
--render_video --ckpt_epoch_to_load 40000 --video_expname video_01 \
--video_frames 4842 4835 4854 4847 4871 4895 --num_Gaps 10
--video_frames
defines a camera trajectory that goes through the specified views smoothly.
--num_Gaps
is the number of interpolated novel views between each view.
This code is built upon HashNeRF-pytorch implementation of instant-ngp, and Paddle implementation of DreamBooth.
We thank them for their nice implementation!