-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transforming pixel from a depth image to world coordinates #1904
Comments
I'm not sure if I understand your problem completely, but isn't this what |
I know that the center of the front side of the camera is placed at a certain position with respect to my world frame, but I don't know what is transformation between this point and 'the center of the physical imager' (the reference point). Is this information available as 'extrinsics'? How can I retrieve this information in python? I hope this time my question is more clear. |
I used this script and I found its result is not bad. import pyrealsense2 as rs
import numpy as np
import cv2
import os
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
# Configure depth and color streams
pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, 1280, 720, rs.format.z16, 30)
config.enable_stream(rs.stream.color, 1280, 720, rs.format.bgr8, 30)
# Start streaming
pipe_profile = pipeline.start(config)
curr_frame = 0
try:
while True:
# Wait for a coherent pair of frames: depth and color
frames = pipeline.wait_for_frames()
depth_frame = frames.get_depth_frame()
color_frame = frames.get_color_frame()
if not depth_frame or not color_frame:
continue
# Intrinsics & Extrinsics
depth_intrin = depth_frame.profile.as_video_stream_profile().intrinsics
color_intrin = color_frame.profile.as_video_stream_profile().intrinsics
depth_to_color_extrin = depth_frame.profile.get_extrinsics_to(
color_frame.profile)
# print(depth_intrin.ppx, depth_intrin.ppy)
# Convert images to numpy arrays
depth_image = np.asanyarray(depth_frame.get_data())
color_image = np.asanyarray(color_frame.get_data())
# find the human face in the color_image
gray = cv2.cvtColor(color_image, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.3, 5)
for (x, y, w, h) in faces:
if curr_frame > 100 and curr_frame % 40 == 10:
roi_depth_image = depth_image[y:y+h, x:x+w]
roi_color_image = color_image[y:y+h, x:x+w]
os.system('mkdir -p ./3d_output/%d' % curr_frame)
cv2.imwrite('./3d_output/%d/depth.jpg' %
curr_frame, roi_depth_image)
cv2.imwrite('./3d_output/%d/color.jpg' %
curr_frame, roi_color_image)
print("the mid position depth is:", depth_frame.get_distance(
int(x+w/2), int(y+h/2)))
# write the depth data in a depth.txt
with open('./3d_output/%d/depth.csv' % curr_frame, 'w') as f:
cols = list(range(x, x+w))
rows = list(range(y, y+h))
for i in rows:
for j in cols:
# 坐标变换一定要注意检查
depth = depth_frame.get_distance(j, i)
depth_point = rs.rs2_deproject_pixel_to_point(
depth_intrin, [j, i], depth)
text = "%.5lf, %.5lf, %.5lf\n" % (
depth_point[0], depth_point[1], depth_point[2])
f.write(text)
print("Finish writing the depth img")
cv2.rectangle(color_image, (x, y), (x+w, y+h), (255, 0, 0), 2)
# Apply colormap on depth image (image must be converted to 8-bit per pixel first)
depth_colormap = cv2.applyColorMap(cv2.convertScaleAbs(
depth_image, alpha=0.03), cv2.COLORMAP_JET)
# Stack both images horizontally
images = np.hstack((color_image, depth_colormap))
# Show images
cv2.namedWindow('RealSense', cv2.WINDOW_AUTOSIZE)
cv2.imshow('RealSense', images)
cv2.waitKey(1)
curr_frame += 1
finally:
# Stop streaming
pipeline.stop() I used OpenCV's face detector to capture the human face and then transferred the face region to 3D coordinates. May the script is not elegant. I think the essential code you're interested in is
, where I use Since I am a green hand in 3D reconstruction, I am not sure whether it can solve your problem, while I hope it can give you some inspiration. |
Hi @longjj, For getting the 'depth_point' (x,y,z with respect to the reference point of the image) you can see I did something really similar to you. My problem starts right after I have this 3D 'depth_point'. I don't know how to transform it to my world coordinates (the corner of my table for example), because I don't know where is this 'reference point of the image' with respect to the center of the real camera, because the images are not aligned to the physical camera. I will try to explain my situation in a different way. This is my scene:
The offsets I am talking about are around 1-2 cm, and I am trying to get a precision of less then 1 mm. So you can understand that the center of the depth image is not where the real center of the physical camera is pointing at. I am trying to understand this offset in order to be able to transform from a certain pixel in the image (not exactly in the middle) to a certain position with respect to the my world coordinates. As I understand, using parameters from 'extrinsics' only let you align the depth and color images, but neither of them is aligned with the real center of the physical camera. I hope you'll have some suggestions. Thanks! |
Hi @Litvak1 , Using my poor 3D knowledge I don't know how to solve this problem in pylibrealsense2 either currently. Maybe @0xLeon is right, and you just need some transform with Maybe we need to wait for others' further reply. |
The center is aligned to the LEFT imager (as seen from the camera, so it's left eye). |
I appreciate all your help guys! So far I haven't found a written explanation to my question. I decided to perform a small test to see if I could find this 'reference point'. I placed a small cube in the center of my table, and took depth images from different positions around the cube (always from 0.52m above the cube, but with different x and y positions). From each image I manually found the pixel of the center of the cube, and then used I attached my results. The values in columns J and K are the 'average offset' of all my positions. So as far as I understand, the reference point of my specific camera is 'hanging in the air', located -0.01931m in the X axis and 0.01638m in the Y axis with respect to the center of the front side of the physical camera. If that is true, one can assume that this offset information is coded to the camera, just like its intrinsics parameters. Did anyone hear about that before and can explain? I still need the exact parameters for my research. I'm trying the reach pose estimation accuracy of less than 1 mm error. Thanks |
If your question is still where is the origin of the imager, I would refer you to this link: |
I actually read the post from the link you sent two weeks ago :) From the calibration I did it seems that the reference point of the camera (where the [0,0,0] is located) is 'floating' in the air and not where the physical left imager is located. I think you can get the idea of what I'm trying to explain from the file I uploaded before: The values in columns J and K are the 'average offset' of all my positions. So as far as I understand, the reference point of my specific camera is 'hanging in the air', located -0.01931m in the X axis and 0.01638m in the Y axis with respect to the center of the front side of the physical camera. |
My RealSense knowledge doesn't cover calibration, unfortunately, so one of the excellent team members on this forum can undoubtedly provide you with a better answer to this question than I can. |
[Realsense Customer Engineering Team Comment] |
[Realsense Customer Engineering Team Comment] |
@longjj Hi, thanks for sharing your codes. I have a question about the color and depth images. How can I get the world coordinates of a pixel in the color image. Another question is that, when you detect a face, you directly use the detection results on depth image. I think you should first align the depth image to color image and then you can use the same detection results. |
@Superlee506 Hi, I am sorry that I am no longer deep into the usage of Intel realsense SDK in the recent time. Maybe there is an api for the converting between the color image & depth image. Or there is an api that returns the answer you want directly. What's more, your suggestion about the alignment is right and I didn't consider it at that time, thanks. |
@0xLeon can I say reference world coordinates as robot base_link? |
By using the 'ppx' and 'ppy' parameters of intrinsic, we can get the center of projection. |
@Litvak1 May be this should give you some idea |
Did you find the solution for transferring image coordinate to real-world coordinate? I am currently facing the same difficulties and it would be really helpful if you could share your experience. Thanks |
Transforming pixel from a depth image to world coordinates
Hi,
As part of my research, I am using the D415 Realsense camera to capture depth images of an object randomly placed on a table (I am using only depth images). I used image detection to find the exact pixel in the depth image representing the center of the object, and I would like to know how to transform from that pixel position (row, column) in the depth image to the real Cartesian position (x,y) in the world, where the real depth and z are known (the images are always being taken from a certain known position above the table).
I used this code to retrieve to 3D point with respect to 'the center of the physical imager' (as explained here, I tested it with ppx and ppy values):
The result is [0.0, 0.0, 0.5] as it should be.
I know where I placed the camera above the table, assuming that the center point would be the center of the front side of the camera (where the glass is), and this assumption is wrong.
So after this introduction, my question is where exactly is the location of 'the center of the physical imager' with respect to the center of the front side of the camera?
Or maybe I should do some alignment before using rs2_deproject_pixel_to_point?
Thank you for your time!
The text was updated successfully, but these errors were encountered: