Transforming pixel from a depth image to world coordinates #1904

Litvak1 · 2018-06-18T15:33:12Z

Required Info
Camera Model	D415
Firmware Version	05.08.15.00
Operating System & Version	Linux (Ubuntu 16.04)
Kernel Version (Linux Only)	4.13.0-45-generic
Platform	PC
SDK Version	?
Language	python

Transforming pixel from a depth image to world coordinates

Hi,
As part of my research, I am using the D415 Realsense camera to capture depth images of an object randomly placed on a table (I am using only depth images). I used image detection to find the exact pixel in the depth image representing the center of the object, and I would like to know how to transform from that pixel position (row, column) in the depth image to the real Cartesian position (x,y) in the world, where the real depth and z are known (the images are always being taken from a certain known position above the table).
I used this code to retrieve to 3D point with respect to 'the center of the physical imager' (as explained here, I tested it with ppx and ppy values):

pipeline = rs.pipeline()
config = rs.config()
config.disable_stream(rs.stream.color)
pipeline.start(config)
frames = pipeline.wait_for_frames()
depth_frame = frames.get_depth_frame()
depth_intrin = depth_frame.profile.as_video_stream_profile().intrinsics
depth_value = 0.5
depth_pixel = [depth_intrin.ppx, depth_intrin.ppy]
depth_point = rs.rs2_deproject_pixel_to_point(depth_intrin, depth_pixel, depth_value)
print depth_point

The result is [0.0, 0.0, 0.5] as it should be.

I know where I placed the camera above the table, assuming that the center point would be the center of the front side of the camera (where the glass is), and this assumption is wrong.

So after this introduction, my question is where exactly is the location of 'the center of the physical imager' with respect to the center of the front side of the camera?
Or maybe I should do some alignment before using rs2_deproject_pixel_to_point?

Thank you for your time!

0xLeon · 2018-06-19T12:51:24Z

I'm not sure if I understand your problem completely, but isn't this what
rs.depth_frame.get_distance(x, y) does? Sure this will give you the 3D position in camera coordinates only. You have to define your reference world coordinate system with know extrinsics of the camera and apply this transformation to the camera coordinates of your measured 3D position to get the 3D position in relation to your defined world coordinate system.

Litvak1 · 2018-06-19T13:04:22Z

get_distance(x, y) will give me the distance to that specific pixel. That distance I already have beacuse I placed the camera at a certain altitude above the table.
rs2_deproject_pixel_to_point is used to transform from a pixel in the image to a 3D point with respect to 'the center of the physical imager' which is somewhere in the camera but not exactly in the middle.

I know that the center of the front side of the camera is placed at a certain position with respect to my world frame, but I don't know what is transformation between this point and 'the center of the physical imager' (the reference point).

Is this information available as 'extrinsics'? How can I retrieve this information in python?

I hope this time my question is more clear.

jianjieluo · 2018-06-19T15:09:31Z

I used this script and I found its result is not bad.

import pyrealsense2 as rs
import numpy as np
import cv2
import os

face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

# Configure depth and color streams
pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, 1280, 720, rs.format.z16, 30)
config.enable_stream(rs.stream.color, 1280, 720, rs.format.bgr8, 30)

# Start streaming
pipe_profile = pipeline.start(config)

curr_frame = 0

try:
    while True:

        # Wait for a coherent pair of frames: depth and color
        frames = pipeline.wait_for_frames()
        depth_frame = frames.get_depth_frame()
        color_frame = frames.get_color_frame()
        if not depth_frame or not color_frame:
            continue

        # Intrinsics & Extrinsics
        depth_intrin = depth_frame.profile.as_video_stream_profile().intrinsics
        color_intrin = color_frame.profile.as_video_stream_profile().intrinsics
        depth_to_color_extrin = depth_frame.profile.get_extrinsics_to(
            color_frame.profile)

        # print(depth_intrin.ppx, depth_intrin.ppy)

        # Convert images to numpy arrays
        depth_image = np.asanyarray(depth_frame.get_data())
        color_image = np.asanyarray(color_frame.get_data())

        # find the human face in the color_image
        gray = cv2.cvtColor(color_image, cv2.COLOR_BGR2GRAY)
        faces = face_cascade.detectMultiScale(gray, 1.3, 5)
        for (x, y, w, h) in faces:
            if curr_frame > 100 and curr_frame % 40 == 10:
                roi_depth_image = depth_image[y:y+h, x:x+w]
                roi_color_image = color_image[y:y+h, x:x+w]
                os.system('mkdir -p ./3d_output/%d' % curr_frame)
                cv2.imwrite('./3d_output/%d/depth.jpg' %
                            curr_frame, roi_depth_image)
                cv2.imwrite('./3d_output/%d/color.jpg' %
                            curr_frame, roi_color_image)
                print("the mid position depth is:", depth_frame.get_distance(
                    int(x+w/2), int(y+h/2)))

                # write the depth data in a depth.txt
                with open('./3d_output/%d/depth.csv' % curr_frame, 'w') as f:
                    cols = list(range(x, x+w))
                    rows = list(range(y, y+h))
                    for i in rows:
                        for j in cols:
                            # 坐标变换一定要注意检查
                            depth = depth_frame.get_distance(j, i)
                            depth_point = rs.rs2_deproject_pixel_to_point(
                                depth_intrin, [j, i], depth)
                            text = "%.5lf, %.5lf, %.5lf\n" % (
                                depth_point[0], depth_point[1], depth_point[2])
                            f.write(text)
                print("Finish writing the depth img")

            cv2.rectangle(color_image, (x, y), (x+w, y+h), (255, 0, 0), 2)

         # Apply colormap on depth image (image must be converted to 8-bit per pixel first)
        depth_colormap = cv2.applyColorMap(cv2.convertScaleAbs(
            depth_image, alpha=0.03), cv2.COLORMAP_JET)

        # Stack both images horizontally
        images = np.hstack((color_image, depth_colormap))

        # Show images
        cv2.namedWindow('RealSense', cv2.WINDOW_AUTOSIZE)
        cv2.imshow('RealSense', images)
        cv2.waitKey(1)

        curr_frame += 1
finally:

    # Stop streaming
    pipeline.stop()

I used OpenCV's face detector to capture the human face and then transferred the face region to 3D coordinates. May the script is not elegant. I think the essential code you're interested in is

with open('./3d_output/%d/depth.csv' % curr_frame, 'w') as f:
                    cols = list(range(x, x+w))
                    rows = list(range(y, y+h))
                    for i in rows:
                        for j in cols:
                            # 坐标变换一定要注意检查
                            depth = depth_frame.get_distance(j, i)
                            depth_point = rs.rs2_deproject_pixel_to_point(
                                depth_intrin, [j, i], depth)
                            text = "%.5lf, %.5lf, %.5lf\n" % (
                                depth_point[0], depth_point[1], depth_point[2])
                            f.write(text)
                print("Finish writing the depth img")

, where I use [j, i] to transfer the pixel one by one to get the 3D data, in meters.

Since I am a green hand in 3D reconstruction, I am not sure whether it can solve your problem, while I hope it can give you some inspiration.

Litvak1 · 2018-06-19T16:25:09Z

Hi @longjj,

For getting the 'depth_point' (x,y,z with respect to the reference point of the image) you can see I did something really similar to you.

My problem starts right after I have this 3D 'depth_point'. I don't know how to transform it to my world coordinates (the corner of my table for example), because I don't know where is this 'reference point of the image' with respect to the center of the real camera, because the images are not aligned to the physical camera.

I will try to explain my situation in a different way. This is my scene:

I have a surface (table. but lets ignore the height of its legs and treat it as it's the floor). It's dimensions are 1 meter * 1 meter.
The 'base frame' in this world (the point [0,0,0]) is in the corner of table (in 'world coordinates')
I placed the camera 0.75 meter above the center of the table, pointing down at the surface, in a position where the center of the front of the camera is in position [0.5, 0.5, 0.75] with respect to my world coordinates.
I placed a small cube in the center of the surface, at position [0.5, 0.5, 0], again with respect to my world coordinates.
I took a depth image, and the center of the cube is not in the center pixel of the image, but instead it is located in some offset from the center pixel.

The offsets I am talking about are around 1-2 cm, and I am trying to get a precision of less then 1 mm.

So you can understand that the center of the depth image is not where the real center of the physical camera is pointing at. I am trying to understand this offset in order to be able to transform from a certain pixel in the image (not exactly in the middle) to a certain position with respect to the my world coordinates.

As I understand, using parameters from 'extrinsics' only let you align the depth and color images, but neither of them is aligned with the real center of the physical camera.

I hope you'll have some suggestions.

Thanks!

jianjieluo · 2018-06-20T01:16:06Z

Hi @Litvak1 ,
I get your point. The infrared camera and the RGB color camera do have offset which leads to the situation you say.

Using my poor 3D knowledge I don't know how to solve this problem in pylibrealsense2 either currently. Maybe @0xLeon is right, and you just need some transform with depth_to_color_extrin while I didn't try it before.

Maybe we need to wait for others' further reply.

agrunnet · 2018-06-21T14:25:00Z

The center is aligned to the LEFT imager (as seen from the camera, so it's left eye).

Litvak1 · 2018-06-22T15:48:29Z

I appreciate all your help guys!

So far I haven't found a written explanation to my question.

I decided to perform a small test to see if I could find this 'reference point'. I placed a small cube in the center of my table, and took depth images from different positions around the cube (always from 0.52m above the cube, but with different x and y positions). From each image I manually found the pixel of the center of the cube, and then used rs2_deproject_pixel_to_point to find where the camera thinks the object is located with respect to its 'unknown' reference point.
From the altitude of 0.52m above the object, the depth image width is ~0.71m, height is ~0.4m, and each pixel is a square of size 0.00055m.

I attached my results.
realsense_calibration.xlsx
I hope the headlines are enough to understand the table. I'll just say that the letters in brackets represent the columns that are being calculated.

The values in columns J and K are the 'average offset' of all my positions. So as far as I understand, the reference point of my specific camera is 'hanging in the air', located -0.01931m in the X axis and 0.01638m in the Y axis with respect to the center of the front side of the physical camera.

If that is true, one can assume that this offset information is coded to the camera, just like its intrinsics parameters.

Did anyone hear about that before and can explain?

I still need the exact parameters for my research. I'm trying the reach pose estimation accuracy of less than 1 mm error.

Thanks

MartyG-RealSense · 2018-07-12T09:25:30Z

If your question is still where is the origin of the imager, I would refer you to this link:

https://communities.intel.com/message/543826#543826

Litvak1 · 2018-07-12T16:53:31Z

I actually read the post from the link you sent two weeks ago :)
Unfortunately, it didn't answer my question.

From the calibration I did it seems that the reference point of the camera (where the [0,0,0] is located) is 'floating' in the air and not where the physical left imager is located.

I think you can get the idea of what I'm trying to explain from the file I uploaded before:
realsense_calibration.xlsx

The values in columns J and K are the 'average offset' of all my positions. So as far as I understand, the reference point of my specific camera is 'hanging in the air', located -0.01931m in the X axis and 0.01638m in the Y axis with respect to the center of the front side of the physical camera.

MartyG-RealSense · 2018-07-12T18:23:57Z

My RealSense knowledge doesn't cover calibration, unfortunately, so one of the excellent team members on this forum can undoubtedly provide you with a better answer to this question than I can.

RealSense-Customer-Engineering · 2018-07-30T06:19:00Z

[Realsense Customer Engineering Team Comment]
@Litvak1
Here is the physical origin documentation which will be integrated into our data sheet, attach here for your reference.

carypai · 2018-07-30T06:20:55Z

RealSense-Customer-Engineering · 2018-07-31T01:55:08Z

[Realsense Customer Engineering Team Comment]
@litvak
Just found out it's already merged into latest data sheet, please check with link below :
https://www.intel.com/content/dam/support/us/en/documents/emerging-technologies/intel-realsense-technology/Intel-RealSense-D400-Series-Datasheet.pdf

Superlee506 · 2018-09-29T02:34:59Z

@longjj Hi, thanks for sharing your codes. I have a question about the color and depth images. How can I get the world coordinates of a pixel in the color image.

Another question is that, when you detect a face, you directly use the detection results on depth image. I think you should first align the depth image to color image and then you can use the same detection results.

jianjieluo · 2018-09-30T03:51:28Z

@Superlee506 Hi, I am sorry that I am no longer deep into the usage of Intel realsense SDK in the recent time. Maybe there is an api for the converting between the color image & depth image. Or there is an api that returns the answer you want directly.

What's more, your suggestion about the alignment is right and I didn't consider it at that time, thanks.

npd26 · 2019-06-04T18:49:38Z

@0xLeon can I say reference world coordinates as robot base_link?

Denial1990 · 2019-08-27T08:56:33Z

By using the 'ppx' and 'ppy' parameters of intrinsic, we can get the center of projection.

saikishor · 2020-01-27T07:45:13Z

@Litvak1 May be this should give you some idea
https://github.com/ros-perception/image_pipeline/blob/melodic/depth_image_proc/src/nodelets/point_cloud_xyz.cpp

ahasan016 · 2020-07-06T13:41:29Z

I actually read the post from the link you sent two weeks ago :)
Unfortunately, it didn't answer my question.

From the calibration I did it seems that the reference point of the camera (where the [0,0,0] is located) is 'floating' in the air and not where the physical left imager is located.

I think you can get the idea of what I'm trying to explain from the file I uploaded before:
realsense_calibration.xlsx

The values in columns J and K are the 'average offset' of all my positions. So as far as I understand, the reference point of my specific camera is 'hanging in the air', located -0.01931m in the X axis and 0.01638m in the Y axis with respect to the center of the front side of the physical camera.

Did you find the solution for transferring image coordinate to real-world coordinate? I am currently facing the same difficulties and it would be really helpful if you could share your experience.

Thanks

Litvak1 mentioned this issue Jun 19, 2018

How to change depth units in python #1905

Closed

RealSense-Customer-Engineering self-assigned this Jun 20, 2018

ev-mp added question help wanted labels Jun 20, 2018

Litvak1 mentioned this issue Jul 12, 2018

Working with a long USB 3.0 cable #2045

Closed

RealSense-Customer-Engineering closed this as completed Aug 7, 2018

123liluky mentioned this issue Sep 6, 2018

histogram equalization #2343

Closed

Superlee506 mentioned this issue Sep 30, 2018

How to deproject a pixel in color frame to 3d coordinates using D400 series? #2458

Closed

MartyG-RealSense mentioned this issue Apr 28, 2019

Extracting 3-D world coordinae for all the pixels using D435 #3862

Closed

grigala mentioned this issue May 27, 2019

Face Scanning Features #3716

Closed

MartyG-RealSense mentioned this issue Jun 10, 2019

How to get FOV from the pyrealsense2 object #4179

Closed

portgasray mentioned this issue Sep 6, 2019

Realsense Camera directly get depth and dx , dy information portgasray/ur5_vision_pick#2

Open

MartyG-RealSense mentioned this issue Oct 26, 2019

Getting 3D coordinates of pixels #5135

Closed

MartyG-RealSense mentioned this issue Jan 3, 2020

About spatial position？？？ #5578

Closed

MartyG-RealSense mentioned this issue Apr 15, 2020

How to get depth value of a specific coordinate #6239

Closed

MartyG-RealSense mentioned this issue Jul 2, 2020

How to measure the distance from camera to a pixel on the camera feed? #6749

Closed

AlexandrineRibeiro mentioned this issue Nov 27, 2020

Transforming camera coordinates to world coordinates #7866

Closed

MartyG-RealSense mentioned this issue Nov 10, 2021

Getting distance and 3D coordinate from specific pixel #9945

Closed

MartyG-RealSense mentioned this issue Mar 4, 2024

Improving precision of rs2_deproject_pixel_to_point #12719

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transforming pixel from a depth image to world coordinates #1904

Transforming pixel from a depth image to world coordinates #1904

Litvak1 commented Jun 18, 2018

0xLeon commented Jun 19, 2018 •

edited

Loading

Litvak1 commented Jun 19, 2018 •

edited

Loading

jianjieluo commented Jun 19, 2018

Litvak1 commented Jun 19, 2018

jianjieluo commented Jun 20, 2018

agrunnet commented Jun 21, 2018

Litvak1 commented Jun 22, 2018 •

edited

Loading

MartyG-RealSense commented Jul 12, 2018

Litvak1 commented Jul 12, 2018

MartyG-RealSense commented Jul 12, 2018

RealSense-Customer-Engineering commented Jul 30, 2018

carypai commented Jul 30, 2018

RealSense-Customer-Engineering commented Jul 31, 2018

Superlee506 commented Sep 29, 2018

jianjieluo commented Sep 30, 2018

npd26 commented Jun 4, 2019

Denial1990 commented Aug 27, 2019

saikishor commented Jan 27, 2020

ahasan016 commented Jul 6, 2020

Transforming pixel from a depth image to world coordinates #1904

Transforming pixel from a depth image to world coordinates #1904

Comments

Litvak1 commented Jun 18, 2018

0xLeon commented Jun 19, 2018 • edited Loading

Litvak1 commented Jun 19, 2018 • edited Loading

jianjieluo commented Jun 19, 2018

Litvak1 commented Jun 19, 2018

jianjieluo commented Jun 20, 2018

agrunnet commented Jun 21, 2018

Litvak1 commented Jun 22, 2018 • edited Loading

MartyG-RealSense commented Jul 12, 2018

Litvak1 commented Jul 12, 2018

MartyG-RealSense commented Jul 12, 2018

RealSense-Customer-Engineering commented Jul 30, 2018

carypai commented Jul 30, 2018

RealSense-Customer-Engineering commented Jul 31, 2018

Superlee506 commented Sep 29, 2018

jianjieluo commented Sep 30, 2018

npd26 commented Jun 4, 2019

Denial1990 commented Aug 27, 2019

saikishor commented Jan 27, 2020

ahasan016 commented Jul 6, 2020

0xLeon commented Jun 19, 2018 •

edited

Loading

Litvak1 commented Jun 19, 2018 •

edited

Loading

Litvak1 commented Jun 22, 2018 •

edited

Loading