-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Camera parameters #24
Comments
UPDATE (Jan 9 2022): This issue has been resolved. See Hi! Great question. It sounds like you're doing everything right. I have noticed this problem occasionally also. It appears to affect a small handful of our scenes, but I don't have a complete list. The problem occurs because there is occasionally an additional offset in the V-Ray scene file that gets applied to the camera parameters during photorealistic rendering. I believe the relevant parameters are the "Horizontal Shift" and "Vertical Shift" parameters described in the V-Ray documentation. If I had known about these unpleasant parameters prior to rendering our images, I certainly would have explicitly hard-coded them to 0 for each scene. Perhaps you can help with this issue. (You are especially well-positioned to help because your independent rendering infrastructure with Blender is already set up.) I'm assuming that you have access to the source assets, and you have run the pre-processing phase of our pipeline. Otherwise, how could you do your own rendering in Blender? Anyway, there are several ways to proceed. First, can you manually confirm that the scene you're experimenting with has non-zero values for these shift parameters in its exported vrscene file? Second, given some non-zero shift parameters from a vrscene file, as well as our camera positions and camera orientations, can you compute the correct camera parameters such that your Blender rendering matches our ground truth rendering? I don't know exactly how to do this, because I don't know exactly what the shift parameters mean. Are they describing some kind of off-center perspective projection? What changes are required in your Blender setup to match our pre-rendered ground truth images pixel-perfectly? Third, do you have a complete list of scenes that are affected? It should be straightforward to parse each vrscene file and search for any non-zero shift parameters. See this code for an example of how to programmatically access the cameras in a vrscene file. In addition to these shift parameters, there are also tilt parameters. Are any scenes affected by non-zero tilt parameters? Haha sorry for hijacking your thread with all of these to-do items 😅 I wish I had more answers and fewer questions. But I figured this is the right place to document my incomplete understanding of the problem, and to highlight possible next steps. |
Hi @mikeroberts3000, I am also interested in multi-view applications based on Hypersim, and I stumbled upon the same issue as Ainaz99, Frank-Mic and rikba. It seems that a considerable amount of scenes are concerned with the shifted camera parameters and are thus unusable for multi-view applications. From the reprojections tests that I have run, I think that non-zero tilt offsets are also involved, in addition to the shift offsets. Unfortunately I do not have access to the source assets and cannot run the steps that you suggested to investigate this issue. Did you have by chance time to investigate it and got fresh news about it? If not, would it be possible to publicly expose the VRay tilt and shift parameters for each scene and cameras? I think that based on this information, I would be able to retrieve the correct camera position and orientation. Overall I would be happy to contribute to the resolution of this problem, but I feel that some information coming from the source assets are necessary to achieve it, which I unfortunately do not have... |
Hi @rpautrat, I agree with your assessment that it is necessary to expose some additional information from the source assets. Once this information is exposed, it should be possible to derive the correct camera intrinsics for the scenes that are affected by this issue. I'll follow up with you offline and maybe we can work on this together. |
Hi @mikeroberts3000 @rpautrat ! Are there any updates on the correct camera parameters? |
Hi @Ainaz99, I'm happy to say that we're making solid progress. @rpautrat has been doing a bunch of great experiments to figure all of this out. We identified four V-Ray parameters that can affect the camera intrinsics, and we have an accurate mathematical model of what each of these parameters does in isolation. Roughly speaking, each parameter translates or rotates the image plane in camera space. But we don't have a model of how the parameters interact with each other, and there are many possible conventions (translation then rotation, rotation then translation, different Euler angle conventions, etc). We reached out to Chaos Group, and we're waiting for them to tell us the exact order of operations. Suppose we knew exactly how to transform the image plane as a function of our V-Ray parameters. Let's call this transformation T. We have sketched out a derivation for the non-standard projection matrix P, computed as a function of T, that correctly projects points from world space to image space. This projection matrix P can then be used as a drop-in replacement for the usual pinhole camera perspective projection matrix in graphics and vision applications. To summarize, we think we have a solid understanding of this issue, but we're waiting for Chaos Group to tell us exactly how to compute T based on the V-Ray parameters. If you're super motivated to get this issue resolved, and you don't want to wait for Chaos Group to get back to us, I'd be happy to send you the Jupyter notebooks that we've been using in our experiments. You don't need to install V-Ray to run our notebooks, and you can try to compute T from the V-Ray parameters with a brute force guess-and-check strategy. |
Thanks @mikeroberts3000, happy to hear you made progress in solving the issue. Do you have any estimate on how long it will take the Chaos Group to answer you? |
@Ainaz99 no estimate from Chaos Group. I'll definitely post here when we hear back from them. In the meantime, the invitation is still open for you to experiment with our notebooks and attempt to compute the correct transformation 🤓 An alternative to the guess-and-check strategy would be to set up a simple scene, and infer the correct transformation from rendered observations. It would then be possible to collect a training set of (camera parameter, transformation) pairs, and fit a function to the training set. |
How am I able to acquire the notebook for the camera operations? Can you send a copy of that notebook you mentioned before? (My email is liuzhy71@gmail.com. ) |
Hi @liuzhy71, you're a total legend for having a look at this. I'll send you an email with all of our debugging notebooks. |
In case anyone else is interested, here is a diagram explaining what we think is going on. We think the image plane is being transformed somehow, and there seem to be four camera parameters that control the transformation. So, our goal is to determine the function f that maps from the scalar camera parameters (p1,p2,p3,p4) to a transformation matrix T. In our debugging notebook, we try to guess this function f based on a code snippet that we got from Chaos Group. This code snippet is correct for some combinations of parameters, but not others. So we must be getting something wrong. In order to test the correctness of f in our notebook, we compute a depth image using my own reference raycaster. In this test, we can control the outbound ray at each pixel. Using my reference raycaster, we want to obtain depth images that perfectly match the ones we obtain from V-Ray for all combinations of camera parameters. If we can do this, then we will know that we are implementing f correctly. To make progress, I think a promising approach would be the following. First, compute the correct transformation T, given a depth image rendered by V-Ray with a known set of camera parameters (p1,p2,p3,p4). The transformation can be recovered from the depth image by solving a convex optimization problem. Anyway, once this is working, it is straightforward to render a large collection of depth images with randomly chosen camera parameters, solving for T for each rendered image. After doing so, we will have a large collection of (camera parameters, transformation) pairs. Finally, it is straightforward to fit some kind of simple function (e.g., a neural network) to all of the example pairs. This learned function can then be queried later to output the correct transformation T for a new set of parameters (p1,p2,p3,p4). Once the transformation T is known, it is straightforward to derive a modified projection matrix P (in terms of T) that projects world-space points to the correct image-space coordinates. This projection matrix can used as a drop-in replacement for the usual projection matrix in graphics and vision applications (e.g., multi-view stereo applications, rendering additional images that exactly match our pre-rendered Hypersim images, etc). Of course doing all of this is an unpleasant hack. But so far, Chaos Group has not mathematically characterized these camera parameters, so we must resort to reverse-engineering their meaning. If anyone else is interested in having a look at this issue, comment here and I will send you our debugging notebooks. |
Great news. I obtained some very useful code from Chaos Group, and this has enabled me to make some exciting progress. I now have a working implementation that computes the transformation T from the parameters (p1,p2,p3,p4). Using my own reference raycaster, I can now generate images that match V-Ray images exactly, even in the presence of non-standard camera parameters. I have not yet derived the modified projection matrix P that projects world-space points to the correct pixel locations, but I believe this is relatively straightforward. I'll post any relevant updates here. Thanks again to everyone that is helping out with this issue. Post a comment here if you want to have a look at my latest notebook. |
Wow, I was just able to compute the correct depth using the previous notebook. But I am still working on adjusting the projection matrix with OpenGL rendering |
@liuzhy71 that spreadsheet looks great! I'm thinking about how we can adjust it slightly to make it a bit cleaner, and more suitable for inclusion in the actual dataset. Can you update your spreadsheet with the following information?
The function for computing depth in the previous version of our notebook (i.e., the one I sent you) works for some combinations of parameters, but not others. An easy way to break it is to set horizontal_shift=1.0. I'll send you my latest notebook over email, which works correctly for all parameter combinations. |
ok, I'll try to update spreadsheet with as many parameters as possible. I'm not so familiar with VRay SDK, all the data were parsed from .vrscene file. And I do not have all the files for scene ai_055_xxx. So some data may be missing. |
@liuzhy71 I think there is a 30-day free trial available for the V-Ray AppSDK. If you prepare the code, I'll run it on all of the scenes. |
I have some more good news. I derived a modified projection matrix P (computed in terms of V-Ray's camera parameters) that correctly accounts for this issue. I have verified that my projection matrix P correctly projects world-space points to the correct screen-space locations, even when V-Ray's non-standard camera parameters have a drastic effect on the rendered image. My projection matrix can be used as a drop-in replacement for the usual OpenGL perspective projection matrix. So I think the main technical challenge here has been resolved. For example, here is a depth image for a scene that has been rendered with non-standard camera parameters.
The left image is generated by V-Ray, the middle image is generated by my own reference raycaster, and the right image is a difference image. We see here that the images are nearly identical. So we are generating the correct ray at each pixel. Here is the same depth image, but I have projected the sink mesh (i.e., the world-space vertex positions belonging to the sink) onto the image. The red dots are mesh vertices. We see that the projected vertices align very accurately with the sink in the image. So my modified projection matrix appears to be correct. I have also tried other combinations of camera parameters, and my solution works correctly for those parameters too. Here are the resulting images with different camera parameters.
So we're nearly finished. The only task that remains is to expose the relevant camera parameters for each scene. I will try to do this over the next couple of weeks, and I will post my progress here. |
I have manfully tested all the camera tilt and shift parameters. Now scene 009_003, 038_009, 039_004 are incorrect. |
@mikeroberts3000 @liuzhy71 @Ainaz99 @rpautrat thank you all for the hard work on getting accurate camera parameters for these scenes. Any update on when these will be released and we can use them? 😄 |
I'd also be curious if the fixes have been released somewhere :) |
Hi @Ainaz99 @alexsax @jatentaki @liuzhy71 @rpautrat, I have some good news. I just checked in some data and example code that resolves this issue. In the I apologize that it took so long to address this issue. It was especially challenging to debug because V-Ray's behavior wasn't well-documented, and I've been busy with other projects and holiday travel. |
Hi @mikeroberts3000 , Thanks for releasing the modified version of the camera parameters. |
Hi @Ainaz99, this is possible, but there are some important technical details to be aware of. In particular, the modified "camera orientation" will have some extra transformations encoded in it, and it will not be a rotation matrix. As a result, any downstream code that intends to invert this matrix must take care to actually invert it, rather than merely transposing it. To derive our modified camera orientation, consider the following equation that transforms a point in world-space
where
where
where
|
Thank you @mikeroberts3000 for your explanation!
can I use the new |
@Ainaz99 that looks correct to me, assuming that you want |
Thank you @mikeroberts3000. So I want to use the new camera transformation matrix in Blender. |
I haven't spent much time with Blender, so I'm not sure what exactly it is doing with the matrices you're specifying. Is Blender rendering images via a rasterization approach or a raycasting approach? How exactly are you specifying these matrices to Blender? Are you specifying position and orientation in a combined 4x4 In these notebooks, I show how to reproduce pre-rendered Hypersim images pixel-perfectly using a rasterization approach and a raycasting approach for this specific scene. It should be straightforward to figure out what Blender is doing that is different to these notebooks by digging into the Blender documentation or source code. For what it's worth, in my local repository, I computed |
Hi @Ainaz99, I'm just double-checking if you ever got this rendering functionality figured out in Blender. |
I am working with the provided camera pose data from the Hypersim dataset, specifically the files: `` with h5py.File(camera_positions_hdf5_file, "r") as f: camera_position_world = camera_positions[frame_id] I need to convert this data into Blender's transform_matrix (4x4), considering the differences in the coordinate systems between 3ds Max and Blender. Could you clarify the exact steps required to transform the provided position and rotation data into Blender's coordinate system? If possible, an example of the transformation process or the correct transformation matrix would be very helpful. Thank you for your assistance! |
@huntorochi There is no need to duplicate this question here and in your other post. |
Hi,
I'm trying to render surface normals from the meshes in Blender, using the camera parameters provided. I scale the mesh using
meters_per_asset_unit
. I usecamera_keyframe_positions.hdf5
for the camera location and get the camera euler angles fromcamera_keyframe_orientations.hdf5'
withfov = pi/3 = 1.0471975511965976
.But my renderings do not match the color images from the dataset for only some specific scenes. Is there any other camera parameter I'm missing?
Here's an example for
ai_037_002
,cam_00
,frame.0000
:Thanks for your help.
The text was updated successfully, but these errors were encountered: