Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates (CVPR 2024)

Official Github repository for paper:

Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates
Ka Chun Shum¹, Jaeyeon Kim¹, Binh-Son Hua², Duc Thanh Nguyen³, Sai-Kit Yeung¹
¹Hong Kong University of Science and Technology ²Trinity College Dublin ³Deakin University

We aim to insert an object into a NeRF background. We first customize and fine-tune a text-to-image diffusion model for view synthesis in an inpainting manner, then apply the model to progressively fuse an object into background views to update a background NeRF.

Dataset

1. Data Management

Our dataset comprises 10 sets of multi-view object images and 8 sets of multi-view background images. All photos were taken with an iPhone and feature everyday backgrounds or objects.

Below is the visualization of some objects and backgrounds:

and more objects ...

and more backgrounds ...

2. Data Download

Our dataset can be downloaded here. Unzip it and place the dataset folder as pose-conditioned-NeRF-object-fusion/dataset.

3. Customize your Data (Optional)

You may follow instant-ngp to construct your data.

Environment

Clone the code and build a virtual environment for it:

git clone https://github.com/kcshum/pose-conditioned-NeRF-object-fusion.git
cd pose-conditioned-NeRF-object-fusion

conda create -n posefusion python=3.9
conda activate posefusion

We use Paddle implementation for diffusion model fine-tuning:

conda install -c conda-forge cudatoolkit=11.6 cudnn=8.4.1.50 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
pip install paddlepaddle-gpu==2.4.2.post116 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
pip install paddlenlp==2.5.2 ppdiffusers==0.14.0

You are recommended to test whether Paddle is successfully installed by running in Python:
import paddle
paddle.utils.run_check()
If cudnn cannot be detected, run the followings and try again:
conda env config vars set LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/{change_to_your_dir}/anaconda3/envs/posefusion/include:/{change_to_your_dir}/anaconda3/envs/posefusion/lib
conda deactivate
conda activate posefusion

We use Pytorch for NeRF optimization:

pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip install opencv-python kornia configargparse

Training

1. Training Objective

We aim to insert an object (represented by a set of multi-view object images) into a background NeRF (learned from a set of multi-view background images).

Below are the video visualization results of some edited NeRFs, where the red boxes refer to the target location:

and more edits ...

We provide in configs/commands.txt the configuration of various edits for you to try and modify. One example is described below.

2. Diffusion Model Fine-tuning for Object-blended View Synthesis

Fine-tune a diffusion model in inpainting manner on both the object and background images with customized text prompt:

python -u train_inpainting_dreambooth.py \
--pretrained_model_name_or_path="stabilityai/stable-diffusion-2-inpainting" \
--object_data="model_car" --background_data="wooden_table" \
--object_prompt="sks white model car" --background_prompt="pqp wooden table" \
--max_train_steps_OBJ=4000 --max_train_steps_BG=400

The command is self-explanatory. You may check the available arguments in the code.

The fine-tuned diffusion model is by default saved in dream_outputs/{--object_data}_and_{--background_data}.

3. NeRF Optimization with pose-conditioned dataset updates

Optimize a background NeRF and then insert the object:

python train_nerf_fusion.py \
--config configs/nerf_fusion.txt --datadir "dataset/background/wooden_table" \
--finetuned_model_path "dream_outputs/model_car_and_wooden_table" \
--prompt "sks white model car on pqp wooden table" \
--pivot_name "IMG_4853.png" --box_name wooden_table_02 \
--strength_lower_bound 35 --strength_higher_bound 35

The command is self-explanatory. You may check the available arguments in the code.

--pivot_name is the first view to train. --box_name is the bounding box to use. --strength_lower_bound and --strength_higher_bound refer to the range of diffusion model noise strength used in inferencing for view refinement, here we keep it fixed as 35 (note: 0 = no noise; 100 = pure noise). You may try random noise by specifying --strength_lower_bound 10 --strength_higher_bound 90.

It is a prerequisite to provide a nice first object-blended view before updating the new nearby views. The updating dataset is visualized in logs/{your_experiment}/visualization.

The object bounding box is visualized in logs/{your_experiment}/boundingbox.

The NeRF renderings are periodically visualized in logs/{your_experiment}/{epoch}_{training stage}.

The NeRF model is periodically saved as logs/{your_experiment}/{epoch}.tar.

Inferencing

After the training ends, you may reuse the previous command and specifying extra arguments as follows.

You may render all training views by running:

python train_nerf_fusion.py \
--config configs/nerf_fusion.txt --datadir "dataset/background/wooden_table" \
--finetuned_model_path "dream_outputs/model_car_and_wooden_table" \
--prompt "sks white model car on pqp wooden table" \
--pivot_name "IMG_4853.png" --box_name wooden_table_02 \
--strength_lower_bound 35 --strength_higher_bound 35 \
--render_image --ckpt_epoch_to_load 40000

You may render a video by running:

python train_nerf_fusion.py \
--config configs/nerf_fusion.txt --datadir "dataset/background/wooden_table" \
--finetuned_model_path "dream_outputs/model_car_and_wooden_table" \
--prompt "sks white model car on pqp wooden table" \
--pivot_name "IMG_4853.png" --box_name wooden_table_02 \
--strength_lower_bound 35 --strength_higher_bound 35 \
--render_video --ckpt_epoch_to_load 40000 --video_expname video_01 \
--video_frames 4842 4835 4854 4847 4871 4895 --num_Gaps 10

--video_frames defines a camera trajectory that goes through the specified views smoothly. --num_Gaps is the number of interpolated novel views between each view.

Acknowledgement

This code is built upon HashNeRF-pytorch implementation of instant-ngp, and Paddle implementation of DreamBooth.
We thank them for their nice implementation!

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
configs		configs
diffusion_utils		diffusion_utils
nerf_utils		nerf_utils
LICENSE		LICENSE
README.md		README.md
train_inpainting_dreambooth.py		train_inpainting_dreambooth.py
train_nerf_fusion.py		train_nerf_fusion.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates (CVPR 2024)

Dataset

1. Data Management

2. Data Download

3. Customize your Data (Optional)

Environment

Training

1. Training Objective

2. Diffusion Model Fine-tuning for Object-blended View Synthesis

3. NeRF Optimization with pose-conditioned dataset updates

Inferencing

Acknowledgement

About

Releases

Packages

Languages

License

kcshum/pose-conditioned-NeRF-object-fusion

Folders and files

Latest commit

History

Repository files navigation

Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates (CVPR 2024)

Dataset

1. Data Management

2. Data Download

3. Customize your Data (Optional)

Environment

Training

1. Training Objective

2. Diffusion Model Fine-tuning for Object-blended View Synthesis

3. NeRF Optimization with pose-conditioned dataset updates

Inferencing

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages