Skip to content

Lizhuoling/VIRT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VIRT: Vision Instructed Robotic Transformer for Manipulation Learning

[Project Page] [Paper Page] [Code Link] [Data Link]

This is the official implementation of the paper "VIRT: Vision Instructed Robotic Transformer for Manipulation Learning".

Table of Contents:

  1. Installation
  2. Data
  3. Pre-trained Model
  4. Training
  5. Evaluation
  6. License
  7. Citing

Installation

To prepare the environment, please first create a conda environment with the following command. We do not recommend using other python versions because Isaac Gym may not support them.

conda create -n VIRT python=3.8
conda activate VIRT

Then, please install Isaac Gym following the instructions in Isaac Gym. Our used version is 1.0.preview4. You need to install Isaac Gym using a computer with visualization and at least a GPU. Cuda should also be installed. After installing Isaac Gym, move to the installation path and run the following commands:

cd isaacgym/python
pip install -e .

Now, please go to the root directory of this project and install the following dependencies:

pip install torch==2.3.1 torchvision==0.18.1
pip install pyquaternion
pip install pyyaml
pip install rospkg
pip install pexpect
pip install opencv-python
pip install matplotlib
pip install einops
pip install packaging
pip install h5py
pip install ipython
pip install xformers==0.0.27
pip install tensorboard
pip install tqdm
cd VIRT/detr && pip install -e . && cd ../..

Data

We provide our collected teleoperation data of the three simulated tasks, including 'Move a Single Box', 'Transport the Specified Box', and 'Stack the Specified Boxes'. You can download the data from here. Please unzip the data and put it in the ROOT/VIRT/datasets directory. The data saving structure should look like:

$ROOT/VIRT/datasets
├── isaac_singlebox
│   ├── h5py
│   ├── exterior_camera1
│   ├── exterior_camera2
│   ├── top_camera
│   ├── wrist_camera
├── isaac_singlecolorbox
├── isaac_multicolorbox

The folders isaac_singlebox, isaac_singlecolorbox, and isaac_multicolorbox correspond to the aforementioned three simulated tasks, respectively.

Pre-trained Model

We provide the policy weight of the VIRT model pre-trained by the proposed RIP paradigm at here. Please download it and save it as $ROOT/VIRT/pretrained/VIRT_droid_pretrain.ckpt.

Training

For training VIRT, you can run the following script template at ROOT/VIRT:

torchrun --nnodes=$nnodes --nproc_per_node=$nproc_per_node --node_rank=$node_rank --master_addr=$master_addr --master_port $master_port \
    main.py \
    --config_name $config_name \
    --save_dir ./outputs/$exp_id \
    --data_dir $data_path \
    --num_nodes $nnodes \
    --load_pretrain pretrained/VIRT_droid_pretrain.ckpt

Specifically, we provide config names of the three provided simulated tasks in $ROOT/VIRT/configs, and $config_name is the path of the config file corresponding the experiment you want to try. The variable $data_path is the path to the corresponding dataset, e.g., $ROOT/VIRT/datasets/isaac_singlebox for the task Move a Single Box. The training logs and checkpoint models will be saved in ./outputs/$exp_id. If you only want to use one GPU to train, the variables $nnodes, $nproc_per_node, and $node_rank should be set to 1, 1, and 0, respectively. $master_addr is the IP address of your computer and can be set to 127.0.0.1. $master_port can be 29515, and it must be different for different experiments if you are running multiple experiments simultaneously. Our code supports multi-mode multi-GPU training.

Evaluation

For evaluating the trained policy, you can follow the script command template as follows:

torchrun --nnodes=$nnodes --nproc_per_node=$nproc_per_node --node_rank=$node_rank --master_addr=$master_addr --master_port $master_port \
    main.py \
    --config_name $config_name \
    --save_dir outputs/$exp_id \
    --load_dir outputs/$exp_id/policy_latest.ckpt \
    --num_nodes $nnodes \
    --eval

The variables are set in the same way as the training script.

License

This project is released under CC-BY-NC 4.0.

Citing

Please use the following BibTeX entry if you find our work useful:

@article{li2024virt,
  title={VIRT: Vision Instructed Robotic Transformer for Manipulation Learning},
  author={Zhuoling, Li and Liangliang, Ren and Jinrong, Yang and Yong, Zhao and others},
  journal={arXiv preprint arXiv:2410.07169},
  year={2024}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages