Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question image frame to image frame conversion. #25

Closed
Frank-Michel opened this issue May 18, 2021 · 16 comments
Closed

Question image frame to image frame conversion. #25

Frank-Michel opened this issue May 18, 2021 · 16 comments

Comments

@Frank-Michel
Copy link

Frank-Michel commented May 18, 2021

Hi,

I am trying to create pixel correspondences between two images of the same scene similar to Georgia in this issue #10. But I am having trouble getting accurate results.

I am going to use a reference pixel (u = 8, v = 111) in the image 0002 for scene ai_037_004 to make it easier to compare the results.

For this pixel position, I am getting a world coordinate of point_world_0002 = [-213.125, 307.5, 239.125] reading from frame.0002.position.hdf5.

I want to find the corresponding image position in frame 0003 and doing the transformation from world to camera and projecting it into screen coordinates I am arriving at pixel position [u = 99, v = 170], which contains the world coordinate [-214.125, 320.75, 235.375] substantially differing from point_world_0002. Checking image 0003 (frame.0003.position.hdf5) for world points being close to point_world_0002 I have found pixel location [u = 89, v= 155] with a world coordinate [-213.125, 307., 239.125]that is well aligned. Which is also verified to visually match in the color images.

These are the steps I took arriving at [u = 99, v = 170]:
Reading the camera_to_world pose for key frame 0003 and applying the conversion to meters I am getting this homogeneous matrix:

[[ 0.98844629,  0.13750606, -0.06376526, -0.47368558],
 [-0.05787106, -0.04644903, -0.99724291, -1.54865174],
 [-0.14008878,  0.98941122, -0.03795475,  0.83491088],
 [ 0.0,         0.0,         0.0,          1.0        ]],

and inverting it to get world_to_camera transformation yields the following transformation matrix:

[[ 0.98844629, -0.05787106, -0.14008878,  0.49555229],
 [ 0.13750606, -0.04644903,  0.98941122, -0.83286892],
 [-0.06376526, -0.99724291, -0.03795475, -1.54289783],
 [ 0.0,        0.0,         0.0,         1.0        ]].

I am getting the point in camera coordinates of key frame 0003 as follows:
point_camera = world_to_camera * point_world, resulting in the camera coordinate point_camera = [-261.46114373, 192.1710333 , -303.68105121].

Following the code in this link


I am using the parameters to project the camera_point into screen coordinates:

fov_x = 1.0471975511965976
fov_y = 0.8172757101951849
f_h = 0.004330127018922193
f_w = 0.005773502691896258
image_width = 1024
image_height = 768

resulting in point_clip = [-367.89013141, 253.38057796, 456.41706849, 456.42794004]
and point_ndc = [-0.80602018, 0.55513818, 0.99997618, 1.0]
leading to point_screen = [ [ 99.22067678, 170.60450627, 0.99998809]]
and finally to point_image = [99.22185845312404, 170.60653808660638].

Any idea what I am missing or doing wrong.

@mikeroberts3000
Copy link
Collaborator

mikeroberts3000 commented May 18, 2021

UPDATE (Jan 9 2022): This issue has been resolved. See contrib/mikeroberts3000 for details.

Hi! Great question. I'll dig into your question in more detail in a follow-up post, but one thing jumped out at me immediately. frame.0002.position.hdf5 already stores positions in asset coordinates. So it is an error to apply additional operations that attempt to convert into asset coordinates.

@Frank-Michel
Copy link
Author

Frank-Michel commented May 18, 2021

Thanks for looking into the issue. I fixed the typo -> meters to asset unit conversion factor -> meters from asset unit conversion factor. I am converting asset units into meters which is not even necessary if done consistently for both the camera poses and the world coordinates images, I think.

@mikeroberts3000
Copy link
Collaborator

Correct. Converting into meters is not necessary, and might introduce errors if you don't do it consistently. To keep things simple, I recommend performing your calculations in asset coordinates. That way, you can just use the raw values in frame.0002.position.hdf5, camera_keyframe_positions.hdf5, and frame.0003.position.hdf5 without needing to perform any conversions. Here is what we say about asset coordinates in our README:

We store positions in asset coordinates (and lengths in asset units) unless explicitly noted otherwise. By asset coordinates, we mean the world-space coordinate system defined by the artist when they originally created the assets. In general, asset units are not the same as meters. To convert a distance in asset units to a distance in meters, use the meters_per_asset_unit scale factor defined in ai_VVV_NNN/_detail/metadata_scene.csv.

In your original question, I think you are inconsistently converting from asset units into meters, and this is leading to downstream problems.

@Frank-Michel
Copy link
Author

You are right, I had errors in question and I correct those. The final numbers are, however, still showing that something is wrong since I am still arriving at [u=99, v=170] in image 0003.

@mikeroberts3000
Copy link
Collaborator

Out of curiosity, can you try your code on a different scene? What happens when you try your code on ai_001_001?

@Frank-Michel
Copy link
Author

Frank-Michel commented May 18, 2021

I am happy to share the code for ai_037_004.
I was using ai_001_001 and the results are better, still not perfect.
What I am doing is trying to get dense correspondences for all overlapping pixels in two images.
Firstly, I am projecting the points of image 1 into image 2 following the procedure that I am describing in the question and check whether the projected pixel positions are within the image bounds of image 2.
Secondly, I am checking the correspondences for world frame alignment, also as described in the question.
The number of accurate pixel correspondences is very small (if there are any) for ai_037_004 and it is larger for ai_001_001. I can't say if ai_001_001 is perfect or if there are also correspondences that should be aligned in the world frame while they are not.

@mikeroberts3000
Copy link
Collaborator

mikeroberts3000 commented May 18, 2021

I don't need to see your code.

Here is a silly test. Let's start with ai_001_001 and frame.0000.positions.hdf5. Using your code, what happens when you try to project the world-space positions from frame.0000 back into frame.0000? In this simple test, the world-space position p at pixel coordinate x should project exactly to x, and there should not be any out-of-bounds pixel coordinates. What happens when you try this test?

@Frank-Michel
Copy link
Author

The code works for that test. I am getting the image space points that match world point indices.

@mikeroberts3000
Copy link
Collaborator

mikeroberts3000 commented May 19, 2021

Ah, that's great. Let's stick with ai_001_001 for one more test. Can you project successfully from frame.0000 to frame.0001? In this test, the results will be slightly harder to interpret. But you should find that the distinctive pixels in frame.0000 (e.g., the eye of the rubber duckie) project exactly onto the correct locations in frame.0001.

@Frank-Michel
Copy link
Author

I was able to project points from frame 0001 to frame 0000 for sequence ai_001_001 and they are all accurate.
I tested the projection from world frame 0000 to image frame 0000 for sequence ai_037_004 and the results are not accurate. I am getting correspondences like this:
world position image index -> projected image index
[1023, 767] -> [1037.188874759605, 788.3654947068483]
[523, 740] -> [522.7958023326736, 751.8394992647837]
[19, 721] -> [ 27.39181906799229, 725.1304825881214].

@Frank-Michel
Copy link
Author

A more comprehensive analysis of the projection:
world position image index [x,y] projected image index [x,y]
0 0 14.56485919480246 20.98604918179333
0 192 13.240304487250835 206.84740821647222
0 384 11.906418170874948 394.01817749159034
0 576 10.566502596434342 582.0349715412602
0 767 9.124479071073608 769.2866124996209
256 0 260.5756445901112 16.795947755719368
256 192 260.01931850317544 204.9651991369979
256 384 259.33894689397846 394.1164462644809
256 576 258.1955592421693 584.8202590214846
256 767 257.31233712304396 775.227246390304
512 0 511.9676611800353 12.617228723203791
512 192 511.9490786603901 203.01014514793664
512 384 511.9673457208907 394.24665428218583
512 576 511.8045828772254 586.5957349275202
512 767 512.0313023676448 779.5800695303938
768 0 769.1950385573618 8.278653723670041
768 192 769.9069015233977 200.90792430901453
768 384 770.6768769145435 394.38218261336056
768 576 771.3283505195662 588.9655761623519
768 767 773.2910781660439 784.0872464615671
1023 0 1031.4745608404019 4.048429610393255
1023 192 1032.858584534117 198.59835140844078
1023 384 1034.3193125725315 394.45928883848705
1023 576 1035.682633568062 591.3860462232979
1023 767 1037.188874759605 788.3654947068483

@mikeroberts3000
Copy link
Collaborator

Now that your code is working for ai_001_001, but you are still encountering problems for some scenes, I believe you're encountering the same issue as in #24. Here is what I said in that thread.

It sounds like you're doing everything right. I have noticed this problem occasionally also. It appears to affect a small handful of our scenes, but I don't have a complete list. The problem occurs because there is occasionally an additional offset in the V-Ray scene file that gets applied to the camera parameters during photorealistic rendering. I believe the relevant parameters are the "Horizontal Shift" and "Vertical Shift" parameters described in the V-Ray documentation. If I had known about these unpleasant parameters prior to rendering our images, I certainly would have explicitly hard-coded them to 0 for each scene.

In that thread, I recommended some steps that could help to make progress on this issue, but some of those steps require access to the native scene assets. However, you can still help us to make progress on this issue, even if you don't have access to the native scene assets.

Can you compile a complete list of scenes that are affected? You could do this by repeating the test you've already done for every scene, i.e., project the world-space position at pixel coordinate x and verify that it projects to x. You're well-positioned to do this test because you already have working code for ai_001_001. Based on my (admittedly incomplete) understanding of this issue, I believe it will either affect all the images in a scene, or none of the images in a scene. So it would be sufficient to test frame.0000 for each scene, instead of testing every image in our dataset.

As I said in #24, I apologize for hijacking your thread with to-do items 😅 I wish I had more answers and fewer questions. But I figured this is the right place to document my incomplete understanding of the problem, and to highlight possible next steps.

@Frank-Michel
Copy link
Author

Frank-Michel commented May 19, 2021

I saw that other thread and I was afraid that this could be the same issue. I only have a couple of datasets on disk and I can start the investigation and continue as I find the time to do so. Are you planning to correct the dataset once you know what is causing the issue. I am asking because I am not able to acquire the assets but I would still like to use the dataset in its entirety at some point in time.

Working:
ai_001_001

  • cam_00 images indices missing 98/100 frames available [36, 61]

ai_001_002

  • cam_00 99/100 frames available [80]
  • cam_02 98/100 frames available [37, 71]
  • cam_03 96/100 frames available [8, 26, 60, 88]

ai_001_003
ai_001_004
ai_001_005
ai_001_006
ai_001_007
ai_001_008
ai_001_009 (Some textures in the background are broken (window))
ai_001_010
ai_002_001
ai_002_002
ai_002_003
ai_002_004
ai_002_005
ai_002_006
ai_002_007
ai_002_008 63/100 frames available
ai_002_009
ai_002_010
ai_003_001 - cam_01
ai_003_002
ai_055_002

  • cam_00 26/100 frames available
  • cam_01 94/100 frames available [2, 3, 5, 6, 7, 11]

Not Working:
ai_003_001 - cam_00

  • 0 0 -0.25033426616008037 -0.7695952573689163
  • 0 767 -13.007668475165447 770.5691867984004
  • 256 192 268.0968701231145 190.82502285877894
  • 256 576 268.51978063280666 572.8009367536639
  • 512 384 464.56021233256934 399.08972272136856
  • 768 192 792.9351397985437 181.44990767737553
  • 768 576 792.8467324067742 587.2305328820289
  • 1023 0 929.8788682597493 31.905387026595513
  • 1023 767 1020.6979693940957 785.2882261202712

ai_015_001

  • 0 0 -7.697688023427616 -5.68729583182421
  • 0 767 8.733209644965822 760.0745963193878
  • 256 192 254.12481508893748 190.786385603186
  • 256 576 258.38244393426083 574.0210006140402
  • 512 384 512.0132294728252 384.019788245501
  • 768 192 769.9415994638574 190.80113718459728
  • 768 576 765.7381647691732 574.1560502957917
  • 1023 0 1031.090919845037 -5.567419827609462
  • 1023 767 1014.3440201174686 760.013997968241

ai_037_004

  • 0 0 14.56485919480246 20.98604918179333
  • 0 767 9.124479071073608 769.2866124996209
  • 256 192 260.01931850317544 204.9651991369979
  • 256 576 258.1955592421693 584.8202590214846
  • 512 384 511.9673457208907 394.24665428218583
  • 768 192 769.9069015233977 200.90792430901453
  • 768 576 771.3283505195662 588.9655761623519
  • 1023 0 1031.4745608404019 4.048429610393255
  • 1023 767 1037.188874759605 788.3654947068483

ai_041_009

  • 0 0 nan nan
  • 0 767 -10.957568804960527 774.2810936027905
  • 256 192 258.98542401347515 194.61255485492538
  • 256 576 253.5004945479842 577.5174133789244
  • 512 384 512.0810904028659 383.7398506553075
  • 768 192 765.0487433053444 194.73895332061468
  • 768 576 770.4097608560535 577.4872857441686
  • 1023 0 1011.3912408975158 8.910515790848477
  • 1023 767 1034.458311044412 774.3794265718594

ai_041_010

  • 0 0 -33.42155600100292 -20.782527764596782
  • 0 767 30.55294502263839 740.1937162069532
  • 256 192 247.95259719987212 188.1104616248229
  • 256 576 263.98225594969364 567.9680709492656
  • 512 384 511.9874139344739 383.9658565985404
  • 768 192 775.9915266701353 188.16199566005963
  • 768 576 759.9260170051674 567.9353169198489
  • 1023 0 1056.6638617562078 -20.571452449967204
  • 1023 767 992.4506241730853 740.1939787504961

ai_052_002

  • 0 0 -33.90887060720905 -26.89600913764679
  • 0 767 -20.879578711713265 784.4478918534111
  • 256 192 248.34950094576283 185.36995528329933
  • 256 576 251.38868723924926 580.2439422163633
  • 512 384 511.9955010101221 384.0844063690713
  • 768 192 761.4391714137278 196.3471278027416
  • 768 576 758.8460804845913 569.7998452252814
  • 1023 0 999.9146355033657 16.3285943228292
  • 1023 767 989.3135239345924 743.4197417417599

@mikeroberts3000
Copy link
Collaborator

mikeroberts3000 commented May 19, 2021

Ah, thank you for posting that list. Please post here if you find other scenes that are affected.

I think I know what the issue is (those shift and tilt parameters), and I believe it's worth finding a solution. The solution will probably involve exposing the shift and tilt parameters for each scene (e.g., similar to how we expose meters_per_asset_unit), and providing some example code to construct the appropriate camera matrices, given a scene's shift and tilt parameters.

But I'm not sure exactly when I'll get to this. So, there is an opportunity for users of the dataset to help me narrow down the problem, because that will help me resolve it faster.

@rikba
Copy link

rikba commented Jun 14, 2021

We also came across this issue when we tried to compute frame-to-frame pixel correspondences in the Evermotion data set.
For example in volume 44, scene 2 the geometry information is shifted by delta_h=51 pixel and delta_w=31 pixel
shift

We ran a unit test over all frames in the data set trying to find a correlation between frame-to-frame reprojection error and camera parameter shift. When trying to backproject all pixels p of one frame into its own camera plane using the pixel world position r_WP, camera intrinsics K, and camera pose T_CW, i.e.,

p_projected = K * T_CW * r_WP(p_original)

we'd expect p_projected == p_original and thus the median of the reprojection error p_original - p_projected for all pixels of a frame to be around [0 0] pixel. However, for many scenes the median error is greater, e.g., [51 31] as in the example above. This indicates a shifting error for the scene.
We made the reprojection error results available and they can be filtered for abs(delta_h) and abs(delta_w) greater 0 and potentially corrected.

We were able to correct at least1300 frames where this delta occurs and thus improve the median reprojection error for those scenes. But a detailed analysis whether this fixes all the issues is still remaining.

Median reprojection error before correction:
original

Median reprojection error after correction:
corrected

@mikeroberts3000
Copy link
Collaborator

Hi @Frank-Mic @rikba, I have some good news. I just checked in some data and example code that resolves this issue.

In the contrib/mikeroberts3000 directory, I provide a modified perspective projection matrix for each scene that can be used as a drop-in replacement for the usual OpenGL perspective projection matrix, as well as example code for projecting world-space points into Hypersim images.

I apologize that it took so long to address this issue. It was especially challenging to debug because V-Ray's behavior wasn't well-documented, and I've been busy with other projects and holiday travel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants