-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question on coordinate systems #10
Comments
Hi Georgia! At first glance, your reasoning looks correct, and you are looking at all the right places in the code where these coordinate conventions are established and implemented. However, I'm arriving at a slightly different expression for
(See here for an example of where equation 1 is implemented in our code)
|
You are right! I got T1-T0 reverse. In my code I decompose the computation explicitly to do cam0 --> world --> cam1. Some more followup. Let's take scene The range of xyz_cam0: xyz_cam0.min(0) = [-1.30169848, -1.11776386, -4.26171875]
xyz_cam0.min(0) = [ 1.49213205, 1.21655009, -1.91503906] The range of xyz_world: xyz_world.min(0): [ 100.35482207, -116.07335733, 60.82745425]
xyz_world.max(0): [ 102.7282101 , -113.43747262, 62.99127211] The range of xyz_cam1: xyz_cam1.min(0): [ 4.44592554, -13.27550555, 11.32567288]
xyz_cam1.max(0): [ 7.42253576, -10.88670971, 13.72111644] This means that in frame1, the points in the camera coordinates are off completely. For example, z is positive which means all points are behind the camera? I must be messing up somewhere. |
I think we're narrowing down the issue. I'm looking at Anyway, I just did a visual inspection (via Are you using our |
Aha! So to understand better the data I am using:
I use R0 and T0 to convert Does |
Ok I found a good test case now that I understand the data properly. Let's consider For the center pixel, z_cam0 = -depth[y, x]
x_cam0 = u / fx * (-z_cam0)
y_cam0 = v / fy * (-z_cam0) which results in xyz_world = np.matmul(R0, xyz_cam0) + T0 which results in From pos[y, x] = [14.875 , 7.9726562, 41.75] If I understand correctly, So I presume my mistake is that I am mixing metric values (coming from |
Figured it all out! The conversion between asset units and meters is provided! Woop! Thanks Mike for dealing with my stupidity! |
Yay! I'm glad you figured this out 😀 Thank you for highlighting these ambiguities in our documentation. I'll try my best to respond point-by-point to your posts for anyone else reading this, and I'll leave this issue open until I get a chance to clarify the documentation.
|
I just updated the documentation to make these coordinate convention nuances more clear. Thank you for posting this question and helping me to improve the documentation 😀 |
@gkioxari I believe you perform 3D warping from frame 0 to frame 1 right? Would you mind sharing the code you have used to do that. I am curious about how did you do it. |
@phongnhhn92 What do you mean by "3D warping"? What do you want to do exactly? |
@mikeroberts3000 I guess I am using the wrong term here, it should be "homography warping". Since we all know RGB image of frame0, R0, T0, depth0, R1, T1. I would like to warp frame0 to frame1 and see how the warped image looks compared to frame1. So the point here is to have the correct transformation from the camera space of frame0 -> world space of frame 0 -> camera space of frame 1. I think you and @gkioxari have discussed how to do this in this issue. |
I don't know exactly what "homography warping" is, and I don't know what you mean when you're saying you'd like to "warp frame0 to frame1". Anyway, I'm interpreting your question as follows. Most of the time, frame 0 and frame 1 will observe many of the same points, but they will be at different pixel locations (e.g., a coffee mug might be in the corner of frame 0 but closer to the center of frame 1). So, for all the points that are visible in frame 0, you would like to compute their pixel locations in frame 1.
|
I want to relate two frames of the same scene in HyperSim via their camera rotation/translation. For example, assume we have frame0 and frame1 with R0, T0 and R1, T1 respectively as the camera orientation and position.
For frame0 and with the use of the depth map, I want to construct a 3D point cloud of (x, y, z) points in frame0's camera coordinate frame. For each pixel (u, v, depth(v, u)) in frame0
Here (u, v) is the pixel coordinate in ndc space with +u pointing right and +v pointing up (here in your code), (fx, fy) are the camera's focal length (e.g. here in your code)
So now (x, y, z) is in the camera coordinate system of frame0, aka xyz_cam0 = (x, y, z)
Now I want to transform this pointcloud to frame1's camera coordinate system per
Then in turn, xyz_cam1 can be projected into the screen of frame0 after projecting with camera matrix M (here in your code). The projection of xyz_cam1 should give me a frame close to frame1.
The above reasoning doesn't seem to work. So I am doing something wrong! Thanks Mike :)
The text was updated successfully, but these errors were encountered: