-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
executable file
·442 lines (386 loc) · 25.1 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
<!DOCTYPE html>
<html>
<head lang="en">
<meta charset="UTF-8">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<title>RING-NeRF</title>
<meta name="description" content="">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- <meta property="og:image" content="https://jonbarron.info/zipnerf/img/nottingham.jpg"> -->
<meta property="og:image:type" content="image/png">
<meta property="og:image:width" content="1296">
<meta property="og:image:height" content="840">
<meta property="og:type" content="website" />
<!-- <meta property="og:url" content="https://jonbarron.info/zipnerf/"/> -->
<meta property="og:title" content="RING-NeRF : Rethinking Inductive Biases for Versatile and Efficient Neural Fields" />
<meta property="og:description" content="Recent advances in Neural Fields mostly rely on develop-
ing task-specific supervision which often complicates the models. Rather
than developing hard-to-combine and specific modules, another approach
generally overlooked is to directly inject generic priors on the scene rep-
resentation (also called inductive biases) into the NeRF architecture.
Based on this idea, we propose the RING-NeRF architecture which in-
cludes two inductive biases : a continuous multi-scale representation of
the scene and an invariance of the decoder’s latent space over spatial and
scale domains. We also design a single reconstruction process that takes
advantage of those inductive biases and experimentally demonstrates on-
par performances in terms of quality with dedicated architecture on mul-
tiple tasks (anti-aliasing, few view reconstruction, SDF reconstruction
without scene-specific initialization) while being more efficient. More-
over, RING-NeRF has the distinctive ability to dynamically increase the
resolution of the model, opening the way to adaptive reconstruction." />
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:title" content="RING-NeRF : Rethinking Inductive Biases for Versatile and Efficient Neural Fields" />
<meta name="twitter:description" content="Recent advances in Neural Fields mostly rely on develop-
ing task-specific supervision which often complicates the models. Rather
than developing hard-to-combine and specific modules, another approach
generally overlooked is to directly inject generic priors on the scene rep-
resentation (also called inductive biases) into the NeRF architecture.
Based on this idea, we propose the RING-NeRF architecture which in-
cludes two inductive biases : a continuous multi-scale representation of
the scene and an invariance of the decoder’s latent space over spatial and
scale domains. We also design a single reconstruction process that takes
advantage of those inductive biases and experimentally demonstrates on-
par performances in terms of quality with dedicated architecture on mul-
tiple tasks (anti-aliasing, few view reconstruction, SDF reconstruction
without scene-specific initialization) while being more efficient. More-
over, RING-NeRF has the distinctive ability to dynamically increase the
resolution of the model, opening the way to adaptive reconstruction." />
<!-- <meta name="twitter:image" content="https://jonbarron.info/zipnerf/img/teaser.jpg" /> -->
<!-- <link rel="icon" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text y=%22.9em%22 font-size=%2290%22>⚡</text></svg>"> -->
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
rel="stylesheet">
<link rel="stylesheet" href="./static/css/bulma.min.css">
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="./static/css/bulma-slider.min.css">
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
<link rel="stylesheet"
href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<link rel="stylesheet" href="./static/css/index.css">
<link rel="icon" href="./img/icon.ico">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.4.0/css/font-awesome.min.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/codemirror/5.8.0/codemirror.min.css">
<link rel="stylesheet" href="css/app.css">
<link rel="stylesheet" href="css/bootstrap.min.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/codemirror/5.8.0/codemirror.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/1.5.3/clipboard.min.js"></script>
<script src="js/app.js"></script>
<script src="js/video_comparison.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script defer src="./static/js/fontawesome.all.min.js"></script>
<script src="./static/js/bulma-carousel.min.js"></script>
<script src="./static/js/bulma-slider.min.js"></script>
<script src="./static/js/index.js"></script>
<!-- <script src="./static/js/index2.js"></script> -->
</head>
<body>
<div class="container" id="main">
<div class="row">
<h2 class="col-md-12 text-center">
<b>RING-NeRF </b>: Rethinking Inductive Biases for Versatile and Efficient Neural Fields</br>
</h2>
</div>
<div class="row">
<div class="col-md-12 text-center">
<ul class="list-inline">
<li>
<a href="https://doriandpetit.com/">
Doriand Petit¹²
</a>
</li>
<li>
<a href="https://scholar.google.com/citations?hl=fr&user=Ym-suFYAAAAJ">
Steve Bourgeois¹
</a>
</li>
<li>
<a href="https://cedric.cnam.fr/lab/en/author/paveld/">
Dumitru Pavel¹
</a>
</li>
<li>
<a href="https://scholar.google.com/citations?user=kUVG8pIAAAAJ&hl=fr">
Vincent Gay-Bellile¹
</a>
</li>
<li>
<a href="https://scholar.google.fr/citations?hl=fr&user=be4jSOIAAAAJ&view_op=list_works&sortby=title">
Florian Chabot¹
</a>
</li>
<li>
<a href="https://www.irit.fr/~Loic.Barthe/">
Loïc Barthe²
</a>
</li>
</br>¹Université Paris-Saclay, CEA List, F-91120, Palaiseau, France ²IRIT, Université Toulouse III, CNRS, France
</ul>
</div>
</div>
<div class="row">
<h3 class="col-md-12 text-center" style="color:red;">
<b>🎉 RING-NeRF has been accepted to ECCV'24 ! See you @ Milan ! 🎉 </b>
</h3>
</div>
<script src="js/confetti.js"></script>
<script>
// for starting the confetti
const start = () => {
setTimeout(function() {
confetti.start()
}, 1000); // 1000 is time that after 1 second start the confetti ( 1000 = 1 sec)
};
// for stopping the confetti
const stop = () => {
setTimeout(function() {
confetti.stop()
}, 5000); // 5000 is time that after 5 second stop the confetti ( 5000 = 5 sec)
};
// after this here we are calling both the function so it works
start();
stop();
// if you dont want to make it stop and make it infinite you can just remove the stop function 😊
</script>
<div class="row">
<div class="col-md-4 col-md-offset-4 text-center">
<ul class="nav nav-pills nav-justified">
<li>
<a href="https://arxiv.org/abs/2312.03357">
<image src="img/paper_image.png" height="40px">
<h4><strong>Paper</strong></h4>
</a>
</li>
<li>
<a href="https://github.com/CEA-LIST">
<image src="img/github.png" height="40px">
<h4><strong>Code (To be released)</strong></h4>
</a>
</li>
</ul>
</div>
</div>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<h3>
Abstract
</h3>
<p class="text-justify">
Recent advances in Neural Fields mostly rely on developing task-specific supervision which often complicates the models. Rather
than developing hard-to-combine and specific modules, another approach
generally overlooked is to directly inject generic priors on the scene representation
(also called inductive biases) into the NeRF architecture.
Based on this idea, we propose the RING-NeRF architecture which in
cludes two inductive biases : a continuous multi-scale representation of
the scene and an invariance of the decoder’s latent space over spatial and
scale domains. We also design a single reconstruction process that takes
advantage of those inductive biases and experimentally demonstrates on
par performances in terms of quality with dedicated architecture on multiple
tasks (anti-aliasing, few view reconstruction, SDF reconstruction
without scene-specific initialization) while being more efficient. Moreover,
RING-NeRF has the distinctive ability to dynamically increase the
resolution of the model, opening the way to adaptive reconstruction. </p>
</div>
</div>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<h3>
Global Scheme
</h3>
<div class="text-center">
<div style="position:relative;padding-top:30.25%;">
<img src="img/global.png" style="position:absolute;top:0;left:0;width:100%;height:100%;">
</div>
<p class="text-justify">
Overview of RING-NeRF: to render a pixel, the casted cone is sampled with
cubes. Depending on the cube volume, the corresponding LOD of the scene is selected
and the latent feature is computed using a weighted sum of the grid hierarchy. The
density (or SDF) and color of the cube are first decoded from the latent feature with
a tiny MLP and then integrated with other samples through volume rendering.</p>
</div>
</div>
</div>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<h3>
A NERF architecture with an LOD inductive bias
</h3>
<p class="text-justify">
While a Neural Field based on a single 3D feature grid defines a single mapping function between the MLP-decoder feature space and the 3D scene space, our architecture, with its 3D hierarchical grid linked with residual connections, is able to define such a function continuously in the scale space. The architecture itself induces, by construction, a notion of level of details in the representation, allowing to reconstruct a scene at multiple precisions even if the reconstruction is only supervised at the highest level of details.
</p>
<div class="columns is-vcentered interpolation-panel">
<div class="column is-3 has-text-centered">
<img src="./static/images/interpolate_start.jpg"
class="interpolation-image"
alt="Interpolate start reference image."/>
<p>First Grid Only </p>
</div>
<div class="column interpolation-video-column">
<div id="interpolation-image-wrapper">
Loading...
</div>
<input class="slider is-fullwidth is-large is-info"
id="interpolation-slider"
step="1" min="10" max="79" value="45" type="range">
</div>
<div class="column is-3 has-text-centered">
<img src="./static/images/interpolate_end.jpg"
class="interpolation-image"
alt="Interpolation end reference image."/>
<p class="is-bold">All 8 Grids</p>
</div>
</div>
<p class="text-justify">
The model illustrated here contains 8 grids of increasing resolution which have all been used jointly for the training at maximum precision. Even without specific supervision, any LOD in-between 1 and 8 can be chosen to reconstruct the scene with the corresponding resolution.
</p>
<p>
Moreover, unlike other architectures that rely on a decoder latent-space which is not invariant to the position in the 3D scene space due to the use of positional encoding (eg. NGLOD) nor invariant to the level of details due to the use of feature concatenation (eg. Instant-NGP) , feature modification based on the LOD (eg. Zip-NeRF) or use of per-LOD MLP-decoder (eg. PyNeRF), our architecture relies on a latent space that is invariant to both of them. This property makes our architecture particularly suitable for incremental reconstruction in space and level of details.
</p>
</div>
</div>
<br>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<h3>
Distance-Aware Forward Mapping for Fast Anti-Aliazing
</h3>
<h4>
Anti-Aliasing Mechanism
</h4>
<p class="text-justify">
We also use the properties of our architecture to propose an adapted distance-aware forward mapping to easily solve aliasing issues.
</p> <p class="text-justify">
The nerfacto baseline (right) cannot easily interact with images of different resolutions and/or varying distances of observation. This results in blurred renderings when looking at high resolution renderings and the apparition of aliasing artifacts at low resolution. In comparison, RING-NeRF (left) better adapts to these parameters and produces better renderings in both high and low resolution (respectively near and far observations).
</p>
<div class="column">
<video class="video" width=10% id="xyalias" loop playsinline autoplay muted src="img/FullAA.mp4" onplay="resizeAndPlay(this)"></video>
<canvas height=0 class="videoMerge" id="xyaliasMerge"></canvas></div>
<h4>
Fast Training and Rendering
</h4>
<p>
Similarly to Zip-NeRF, our solution uses cone-casting to adjust the level of details depending on the distance of the sample to the camera. However, unlike Zip-NeRF, the LOD is directly captured by our scene representation. Consequently, our solution is much faster than Zip-NeRF for both reconstruction and rendering since it does not need to achieve multi-sampling, while providing equivalent quality.
</p>
<div class="column">
<video class="video" width=150% id="timezip" controls loop playsinline autoplay muted onplay="resizeAndPlay(this)">
<source src="img/ZipCompa.mp4" type="video/mp4" /></video>
</div>
<h4>
Supervision-Free Anti-Aliasing
</h4>
<p>
Thanks to its LOD inductive bias, RING-NeRF does not necessarily rely on supervision to produce different level of detail.
This implies that our architecture can generalize over novel observation distances without seeing them during training.
This property, although important, is not common in other anti-aliasing methods. For instance, both PyNeRF and ZipNeRF cannot
render coherent aliasing-free images at novel observation distances. This is illustrated in the following video, where all
three models were solely trained on the full resolution images and renders were done in 1/8th resolution. Note that the aliasing artefacts
are especially visible on the central object.
</p>
<div class="column">
<video class="video" width=150% id="timezip" controls loop playsinline autoplay muted onplay="resizeAndPlay(this)">
<source src="img/MoMux8LowRes.mp4" type="video/mp4" /></video>
</div>
</div>
</div>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<h3>
Continuous Coarse-To-Fine Optimization for more Robust NeRF models
</h3>
<p class="text-justify">
NeRF models often lack robustness when facing harder-than-usual setups, such as few viewpoints or when foregoing the scene-specific initialization in surface-based reconstruction. While we demonstrated that RING-NeRF is stabler in itself than the Instant-NGP-based nerfacto, we also proposed an adapted coarse-to-fine training process useful to bring much more robustness. While some works already used similar progressive regularization with Instant-NGP-based models (eg. NeuralAngelo), we benefit from RING-NeRF's properties to obtain a more adapted continuous process which progressively uses more degrees of freedom for the mapping function in the decoder latent space rather than learn new dimensions of it from scratch like concatenation-based architectures.
</p> <p class="text-justify">
We demonstrate the increased stability first on few viewpoints setups. Combined with a simple density loss taken from FreeNeRF to reduce near-cameras artifacts, this results in more 3D-coherent reconstructions when using few views than the adapted nerfacto+ baseline (nerfacto with coarse-to-fine and the same density loss). Note that the unadapted nerfacto does not succeed in creating a coherent geometry, as it mostly overfits in front of the training cameras.
</p>
<div style="position:relative;padding-top:80.25%;">
<img src="img/few-web.jpg" style="position:absolute;top:0;left:0;width:100%;height:100%;">
</div>
</div>
</div>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<p class="text-justify">
SDF reconstruction is known to be a more unstable process than density-based NeRF reconstruction, requiring a scene-specific initialization to converge, as it adds an Eikonal constraint on the model's output. This scene-specific initialization becomes an issue in complex environments and incremental setups, where several types of scenes can coexist and are not necessarily known beforehand. In light of our previously shown increase of stability, we also demonstrated that RING-NeRF permits to forego the initialization of our model while keeping both a precision close to initialization-required models and the convergence speed of faster models (eg. Neus-Facto) which fail without initialization.
</p>
<table style="width: 100%; border-collapse: collapse;">
<tr>
<td style="text-align: center;">
<video class="video" width=100% id="sdf" loop playsinline autoplay muted src="img/final2.mp4" onplay="resizeAndPlay(this)"></video>
<canvas height=0 class="videoMerge" id="sdfMerge"></canvas>
</td>
<td style="text-align: center;vertical-align: middle;">
<div>
<video class="video" width=60% id="sdfgt" loop playsinline autoplay muted src="img/finalGT.mp4" onplay="resizeAndPlay(this)"></video>
</div>
</td>
</tr>
</table>
<div style="position:relative;padding-left:25%;">
</div>
</div>
</div>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<h3>
Resolution Extensibility Property
</h3>
<p class="text-justify">
One unique property of our scene representation is its capacity to be dynamically refined by
adding new grid levels without modifying the decoder’s weights or previously trained grids. This
resolution extensibility property opens the path to adaptive resolution models,
where the precision used to describe an area depends on the details needed, to
optimize efficiency both in memory consumption and training duration.
</p>
<!-- <div class="columns is-vcentered interpolation-panel">
<div class="column interpolation-video-column">
<div id="interpolation-image-wrapper2">
Loading...
</div>
<input class="slider is-fullwidth is-large is-info"
id="interpolation-slider"
step="1" min="3" max="6" value="3" type="range">
</div>
</div> -->
<p class="text-justify">
Here, we showcase the property of our model. We use a grid hierarchy of 6 levels from 16 to 512 max resolution. We begin the reconstruction with only the first three levels, train both the grid and the decoder to convergence and then freeze both of them. We then proceed to train to convergence one novel grid at a time, freezing the previous one at convergence.
</p>
<div style="position:relative;padding-top:20.25%;">
<img src="img/garden.jpg" style="position:absolute;top:0;left:0">
</div>
<p>
This showcases both the capacity of our model to reconstruct finer details after the decoder's training and its ability to keep the coarser LOD valid.
</p>
</div>
</div>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<h3>
Citation
</h3>
<div class="form-group col-md-10 col-md-offset-1">
<textarea id="bibtex" class="form-control" readonly>
@inproceedings{petit2024ring,
title={RING-NeRF: Rethinking Inductive Biases for Versatile and Efficient Neural Fields},
author={Petit, Doriand and Bourgeois, Steve and Pavel, Dumitru and Gay-Bellile, Vincent and Chabot, Florian and Barthe, Loic},
journal={European Conference on Computer Vision (ECCV)},
year={2024}
}</textarea>
</div>
</div>
</div>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<h3>
Acknowledgements
</h3>
<p class="text-justify">
This publication was made possible by the use of the CEA List FactoryIA supercomputer, financially supported by the Ile-de-France Regional Council
<br><br>
The website template was borrowed from <a href="http://mgharbi.com/">Michaël Gharbi</a>, <a href="https://dorverbin.github.io/refnerf">Ref-NeRF</a> and <a href="https://nerfies.github.io/">nerfies</a>.
</p>
</div>
</div>
</div>
</body>
</html>