We use synthetic datasets for training. We simulate events and video sequences with the v2e. Using images from MS-COCO datasets, we simulate videos with arbitrary motions. Half of the sequences contain only one scene, and the other half contain 5-10 foreground object moving randomly over the background. We simulate 1100 sequences, 2s long. We use 1000 sequences for training and another 100 for testing. The image size is 240 × 180 which is the same as that of DAVIS240C. Examples of data sequences can be downloaded from here.
The expected training data structure is as follows:
input_dir
├── train_e2v.txt
├── train_v2e2v.txt
├── sequence_0000000001
│ ├──frames
│ │ ├── timestamps.txt
│ │ ├── frame_0000000000.png
│ │ ├── frame_0000000001.png
│ │ ├── frame_0000000002.png
│ │ └── ....png
│ └── events
│ ├── events_0000000000.npz
│ ├── events_0000000001.npz
│ ├── events_0000000002.npz
│ └── ....npz
├── sequence_0000000002
├── sequence_0000000003
└── sequence_....
train_e2v.txt
and train_v2e2v.txt
are the txt file for loading training sequences. The video sequences are high frame rate generated by Super-Slomo, events_0000000000.npz
are the raw events between frame_0000000000.png
and frame_0000000001.png
.
To test HQF and ECD data sequences, we extract events and frames from .bag files. The examples of slider_depth
and still_life
can also be downloaded from here.