-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ground truth depth is actually distance #9
Comments
Hi @sniklaus, thanks for this great visualization. You're absolutely correct that
But this is a great reminder to be careful when interpreting our @sniklaus, if you have a self-contained code snippet to convert our |
Thanks for chiming in and for the clarifications! I used the following to convert the distance to depth, it expects to have the following variables:
|
Sweet! 😀 |
@sniklaus Hi, the focal length of all images are the same in this dataset? |
@Tord-Zhang When computing planar depth images for typical downstream learning applications, it is a reasonable approximation to assume that all images have the same focal length. However, if you want exactly perfect planar depth data, you need to account for the fact that our camera intrinsics can vary in minor ways for each scene. More specifically, due to minor tilt-shift photography effects that can vary per-scene, the image plane is not guaranteed to be exactly orthogonal to the camera-space z-axis. So what does it mean to compute a "planar" depth image in these cases? What is the exact quantity that you want to store at each pixel in your "planar" depth image? In these cases, the solution that makes the most sense to me is to warp the scene geometry in a way that exactly inverts the tilt-shift photography effects. If you do this correctly, the warped scene geometry viewed through a typical pinhole camera will produce an identical image to the non-warped scene geometry viewed through a tilt-shift camera. At this point, you can compute the planar depth image as usual using the warped scene geometry. |
@sniklaus , how did you derive the formula? I understand that intWidth, intHeight and the focal length fltFocal are all measured in pixels, and npyDistance is the metric distance from camera center to the 3D point in meters. What exactly is the depth that you compute here? If it is the distance from the image plane to the point I would have expected a formula like: |
@lholzherr The code snippet you're referring to does not attempt to compute Euclidean distance to the image plane. That information is already stored in our To illustrate the difference between these two representations, suppose you are 1 meter away from a flat wall, and you capture an image looking directly at the wall. If you capture a planar depth image, it will contain a value of 1 meter at every pixel. If you capture a Euclidean distance image, it will contain a value of 1 meter at the center pixel, but will have different (slightly larger) values at every other pixel. As an aside, I suspect that planar depth images are better-behaved inputs to convolutional neural networks, as compared to distance images. This is because CNNs implicitly assume that the statistics of image patches are stationary as you move across an image, and the statistics of planar depth images are more stationary than distance images. To convert a distance image to a planar depth image, we use the following reasoning.
Using this reasoning, you should try to derive @sniklaus's code snippet above, and either convince yourself it is correct, or post here if you think it is incorrect. |
Thanks @mikeroberts3000 , I was able to derive the formula. If anyone else is wondering, this is the derivation: |
The following line doesn't compute the depth, but the distance from a point to the camera. As such, the provided
depth_meters
files are actuallydistance_meters
instead. Not a big deal as long as you are aware of it, one can convert one to the other using the focal length. But if you aren't aware of it you may get severely wrong results as shown in the screenshots below.ml-hypersim/code/python/tools/generate_hdf5_from_vrimg.py
Line 302 in 9c9be19
If you use the provided depth (which is actually distance) to render the image as a point cloud you will get distortions:
If you instead convert the provided depth to the actual depth and then render the image as a point cloud from that:
The text was updated successfully, but these errors were encountered: