Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

have difficulty to obtain the accurate 3D coordinate (x, y, z) of ArUco tag #8961

Closed
yuan0623 opened this issue May 4, 2021 · 11 comments
Closed

Comments

@yuan0623
Copy link

yuan0623 commented May 4, 2021


Required Info
Camera Model D435
Firmware Version 05.12.12.100
Operating System & Version Ubuntu 18
Kernel Version (Linux Only) 5.4.0-72-generic
Platform PC
SDK Version 2.0
Language c++
Segment Robot

Issue Description

Hi. I'm trying to measure the 3D coordinate (x,y,z) of the ArUco tag. Here is what I have done:

  • detect the pixel coordinate (x_i,y_i) of the ArUco tag
  • use get_distance method to obtain the distance of that pixel
  • use rs2_deproject_pixel_to_point to obtain (x,y,z) of that pixel in 3D space

My code is here:

#include <iostream>
#include<opencv2/aruco.hpp>
#include<opencv2/opencv.hpp>
#include <librealsense2/rs.hpp>
#include <librealsense2/rsutil.h>
#include <numeric>
#include "example.hpp"
int main(int argc, char * argv[]) try
{
    cv::Ptr<cv::aruco::Dictionary> dicttionary = cv::aruco::getPredefinedDictionary(cv::aruco::DICT_6X6_250);

    rs2::log_to_console(RS2_LOG_SEVERITY_ERROR);

    // Declare depth colorizer for pretty visualization of depth data
    rs2::colorizer color_map;
    // Declare rates printer for showing streaming rates of the enabled streams.
    rs2::rates_printer printer;

    // Declare RealSense pipeline, encapsulating the actual device and sensors
    rs2::pipeline pipe;

    rs2::points points;
    rs2::pointcloud pc;
    // Start streaming with default recommended configuration
    // The default video configuration contains Depth and Color streams
    // If a device is capable to stream IMU data, both Gyro and Accelerometer are enabled by default
    rs2::pipeline_profile pipeProfile = pipe.start();
    rs2_intrinsics intrinsics = pipeProfile.get_stream(RS2_STREAM_DEPTH).as<rs2::video_stream_profile>().get_intrinsics();
    while (true) // Application still alive?
    {
        rs2::frameset data = pipe.wait_for_frames().    // Wait for next set of frames from the camera
        apply_filter(printer).     // Print each enabled stream frame rate
                apply_filter(color_map);   // Find and colorize the depth data
        rs2::video_frame video = data.get_color_frame();
        rs2::depth_frame depth = data.get_depth_frame();


        const int w = video.get_width();
        const int h = video.get_height();

        //cv::Mat image, imageCopy;
        cv::Mat image(cv::Size(w, h), CV_8UC3, (void*)video.get_data(), cv::Mat::AUTO_STEP);

        std::vector<int> ids;
        std::vector<std::vector<cv::Point2f>> corners;
        cv::aruco::detectMarkers(image,dicttionary,corners,ids);

        // if ArUco tag is detected
        if (ids.size()>0) {
            cv::aruco::drawDetectedMarkers(image, corners, ids);


            int id_index = 0;
            double dist_to_center;
            float tag_point[3];
            float pixel[2];
            std::vector<float> corner_point_x;
            std::vector<float> corner_point_y;
            for (auto corner = corners.begin();corner != corners.end();corner++){
                for (auto point = corner->begin();point != corner->end();point++){

                    
                    corner_point_x.push_back(point->x);
                    corner_point_y.push_back(point->y);

                }
                // compute the center pixel of the tag
                pixel[0] = std::accumulate(corner_point_x.begin(),corner_point_x.end(),0LL)/corner_point_x.size();
                pixel[1] = std::accumulate(corner_point_y.begin(),corner_point_y.end(), 0LL)/corner_point_x.size();

                // compute the distance of the center of the tag
                dist_to_center = depth.get_distance(pixel[0], pixel[1]);
                std::cout<<"id: "<< ids[id_index] <<" distance: "<<dist_to_center<<std::endl;

                // obtain the 3D coordinate (x,y,z) of the center of the tag
                rs2_deproject_pixel_to_point(tag_point, &intrinsics, pixel, dist_to_center);
                std::cout<<"id: "<< ids[id_index] <<" 3D point: "<<tag_point[0]<<","<<tag_point[1]<<","<<tag_point[2]<<std::endl;
                id_index++;
            }




        }

        rs2::device sensor ;

        cv::imshow("out",image);
        char key = (char) cv::waitKey(1);
        if (key == 27)
            break;
    }

    return EXIT_SUCCESS;
}
catch (const rs2::error & e)
{
    std::cerr << "RealSense error calling " << e.get_failed_function() << "(" << e.get_failed_args() << "):\n    " << e.what() << std::endl;
    return EXIT_FAILURE;
}
catch (const std::exception& e)
{
    std::cerr << e.what() << std::endl;
    return EXIT_FAILURE;
}

My program can compile and run. The ArUco tag detection is pretty robust. However, I think the result is not correct. I'm listing my concern and question below:

  1. The distance only returns one digit after the decimal point. Does this mean that the accuracy of this camera is only within 10 cm?
    Screenshot from 2021-05-04 15-20-22

  2. When the camera is fixed and facing the static ArUco tag, I see the distance reading is switching between some numbers (say 0.6, or 0.7) and 0. This makes the measurement really unreliable. Is there a way to fix it?

  3. I have measured the distance between rgbd camera and tags using a tape, the result is not close to what the camera return.

Do I need any additional steps to calibrate this depth sensor before I use it?
Any advice would be great.
Thank you!

Best
Yuan

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented May 5, 2021

Hi @yuan0623

  1. Could you try adjusting the value of the Depth Unit Scale please, as discussed in the comment linked to below and the comments following below it?

#6104 (comment)

  1. Do you have the camera's IR dot-pattern projection enabled? If the dot pattern is projecting onto the scene that the camera is observing and the dots are overlaid on a fiducial marker then it may make the marker harder for the camera to read.

https://community.intel.com/t5/Items-with-no-label/Object-detection-in-IR-image-containing-projector-pattern/m-p/610593

  1. How far away from the camera is the ArUco marker? With the RealSense 400 Series cameras, error in depth measurement starts at near zero at the camera lenses and increases linearly over distance as the observed object / surface becomes further away from the camera. This phenomenon is called RMS Error.

Because of RMS Error, the measurement returned by the camera will not be exactly the same as a tape-measured distance. The drift in measurement accuracy usually starts to become noticable at 3 meters distance and beyond.

The RealSense D455 camera model has 2x the accuracy over distance of the D435 / D435i, meaning that at 6 meters the D455 has the same accuracy that the D435 models have at 3 meters.

More information about RMS Error can be found in the section of Intel's white-paper document about camera tuning that I have linked to below.

https://dev.intelrealsense.com/docs/tuning-depth-cameras-for-best-performance#section-verify-performance-regularly-on-a-flat-wall-or-target

@yuan0623
Copy link
Author

yuan0623 commented May 5, 2021

Hi @yuan0623

  1. Could you try adjusting the value of the Depth Unit Scale please, as discussed in the comment linked to below and the comments following below it?

#6104 (comment)

  1. Do you have the camera's IR dot-pattern projection enabled? If the dot pattern is projecting onto the scene that the camera is observing and the dots are overlaid on a fiducial marker then it may make the marker harder for the camera to read.

https://community.intel.com/t5/Items-with-no-label/Object-detection-in-IR-image-containing-projector-pattern/m-p/610593

  1. How far away from the camera is the ArUco marker? With the RealSense 400 Series cameras, error in depth measurement starts at near zero at the camera lenses and increases linearly over distance as the observed object / surface becomes further away from the camera. This phenomenon is called RMS Error.

Because of RMS Error, the measurement returned by the camera will not be exactly the same as a tape-measured distance. The drift in measurement accuracy usually starts to become noticable at 3 meters distance and beyond.

The RealSense D455 camera model has 2x the accuracy over distance of the D435 / D435i, meaning that at 6 meters the D455 has the same accuracy that the D435 models have at 3 meters.

More information about RMS Error can be found in the section of Intel's white-paper document about camera tuning that I have linked to below.

https://dev.intelrealsense.com/docs/tuning-depth-cameras-for-best-performance#section-verify-performance-regularly-on-a-flat-wall-or-target

Hi Marty,

Thanks for your reply!

  1. I have adjusted the depth unit in Realsense-Viewer. I think within the Reaslsense-Viewer, it returns reasonable result, but in my cpp project, its return is still one digit after decimal point. Should I adjust the depth unit in my cpp project?
    IMG_0822

  2. As I have mentioned. the ArUco detection is really robust. Also, there is not dot pattern on the tag. The problem is after we have detected the pixel coordinate of the tag, the get_distance returns 0.

  3. The tag is around 50cm away from the Realsense camera. I think the tag is within the range.

Best
Yuan

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented May 5, 2021

There's certainly no harm in changing the depth scale in scripting to see how it affects the output. There is a C++ reference for doing so in the link below.

#2385

50 cm sounds like a reasonable distance for detecting a tag. The size of the tag may be an important factor in its readability from a distance. I am reminded of a case I handled about trying to clearly read barcodes from 70 cm away. As you said though, your detection is robust so there does not seem to be a problem with the tag size in your particular case.

#7187

Is the tag image black or dark gray in color? If it is then I can understand why it may be difficult to read depth information from to obtain distance. It is a general physics principle (not specific to RealSense) that black and dark grey surfaces can absorb light and make that surface difficult for a depth camera to read. That area can appear on a depth image as an empty plain black area with no depth. For example, a black cable may look as though it has been captured successfully on the depth image, when in fact it is just a cable-shaped hole.

If the tag is such a color, using a lighter color may improve depth information readability.

@yuan0623
Copy link
Author

yuan0623 commented May 5, 2021

There's certainly no harm in changing the depth scale in scripting to see how it affects the output. There is a C++ reference for doing so in the link below.

#2385

50 cm sounds like a reasonable distance for detecting a tag. The size of the tag may be an important factor in its readability from a distance. I am reminded of a case I handled about trying to clearly read barcodes from 70 cm away. As you said though, your detection is robust so there does not seem to be a problem with the tag size in your particular case.

#7187

Marty,

Thanks again for your reply!

  1. Based on your reference, I have checked the depth unit scale in my cpp project. The code can be checked below
    // get the device
    rs2::context ctx;
    auto list = ctx.query_devices();

    // Declare RealSense pipeline, encapsulating the actual device and sensors
    rs2::pipeline pipe;
    rs2::config cfg;
    cfg.enable_stream(RS2_STREAM_COLOR, 1280, 720, RS2_FORMAT_BGR8, 30);
    cfg.enable_stream(RS2_STREAM_DEPTH, 1280, 720, RS2_FORMAT_Z16, 30);

    rs2::points points;
    rs2::pointcloud pc;
    // Start streaming with default recommended configuration
    // The default video configuration contains Depth and Color streams
    // If a device is capable to stream IMU data, both Gyro and Accelerometer are enabled by default
    rs2::pipeline_profile pipeProfile = pipe.start(cfg);

    rs2::depth_sensor depth_sensor = pipeProfile.get_device().first < rs2::depth_sensor >();

    float depth_unit = depth_sensor.get_option(RS2_OPTION_DEPTH_UNITS);
    auto scale = depth_sensor.get_depth_scale();
    printf("[Before]Depth Unit Scale: %f\r\n", depth_unit);
    depth_sensor.set_option(RS2_OPTION_DEPTH_UNITS, 0.0001);
    depth_unit = depth_sensor.get_option(RS2_OPTION_DEPTH_UNITS);
    scale = depth_sensor.get_depth_scale();
    printf("[After]Depth Unit Scale: %f\r\n", depth_unit);

The output of the depth unit scale is reasonable, However, the distance and the 3D coordinate still only have 1 digit after the decimal point. (Please see the figure below)
Screenshot from 2021-05-05 14-15-48
Also, the distance value is not accurate. (The distance between the camera and tag is around 0.4 m, the the get_distance return 1.5m).

  1. I also think 50 cm is a reasonable distance. Can you try to run my code on your local PC? Maybe I have a bad sensor?
    My complete program is below:
#include <iostream>
#include<opencv2/aruco.hpp>
#include<opencv2/opencv.hpp>
#include <librealsense2/rs.hpp>
#include <librealsense2/rsutil.h>
#include <numeric>
#include "example.hpp"
int main(int argc, char * argv[]) try
{
    cv::Ptr<cv::aruco::Dictionary> dicttionary = cv::aruco::getPredefinedDictionary(cv::aruco::DICT_6X6_250);

    rs2::log_to_console(RS2_LOG_SEVERITY_ERROR);

    // Declare depth colorizer for pretty visualization of depth data
    rs2::colorizer color_map;
    // Declare rates printer for showing streaming rates of the enabled streams.
    rs2::rates_printer printer;

    // get the device
    rs2::context ctx;
    auto list = ctx.query_devices();

    // Declare RealSense pipeline, encapsulating the actual device and sensors
    rs2::pipeline pipe;
    rs2::config cfg;
    cfg.enable_stream(RS2_STREAM_COLOR, 1280, 720, RS2_FORMAT_BGR8, 30);
    cfg.enable_stream(RS2_STREAM_DEPTH, 1280, 720, RS2_FORMAT_Z16, 30);

    rs2::points points;
    rs2::pointcloud pc;
    // Start streaming with default recommended configuration
    // The default video configuration contains Depth and Color streams
    // If a device is capable to stream IMU data, both Gyro and Accelerometer are enabled by default
    rs2::pipeline_profile pipeProfile = pipe.start(cfg);

    rs2::depth_sensor depth_sensor = pipeProfile.get_device().first < rs2::depth_sensor >();

    float depth_unit = depth_sensor.get_option(RS2_OPTION_DEPTH_UNITS);
    auto scale = depth_sensor.get_depth_scale();
    printf("[Before]Depth Unit Scale: %f\r\n", depth_unit);
    depth_sensor.set_option(RS2_OPTION_DEPTH_UNITS, 0.0001);
    depth_unit = depth_sensor.get_option(RS2_OPTION_DEPTH_UNITS);
    scale = depth_sensor.get_depth_scale();
    printf("[After]Depth Unit Scale: %f\r\n", depth_unit);

    rs2_intrinsics intrinsics = pipeProfile.get_stream(RS2_STREAM_DEPTH).as<rs2::video_stream_profile>().get_intrinsics();

    while (true) // Application still alive?
    {
        rs2::frameset data = pipe.wait_for_frames().    // Wait for next set of frames from the camera
        apply_filter(printer).     // Print each enabled stream frame rate
                apply_filter(color_map);   // Find and colorize the depth data
        rs2::video_frame video = data.get_color_frame();
        rs2::depth_frame depth = data.get_depth_frame();


        const int w = video.get_width();
        const int h = video.get_height();

        //cv::Mat image, imageCopy;
        cv::Mat image(cv::Size(w, h), CV_8UC3, (void*)video.get_data(), cv::Mat::AUTO_STEP);

        std::vector<int> ids;
        std::vector<std::vector<cv::Point2f>> corners;
        cv::aruco::detectMarkers(image,dicttionary,corners,ids);

        // if ArUco tag is detected
        if (ids.size()>0) {
            cv::aruco::drawDetectedMarkers(image, corners, ids);


            int id_index = 0;
            double dist_to_center;
            float tag_point[3];
            float pixel[2];
            std::vector<float> corner_point_x;
            std::vector<float> corner_point_y;
            for (auto corner = corners.begin();corner != corners.end();corner++){
                for (auto point = corner->begin();point != corner->end();point++){


                    corner_point_x.push_back(point->x);
                    corner_point_y.push_back(point->y);

                }
                // compute the center pixel of the tag
                pixel[0] = std::accumulate(corner_point_x.begin(),corner_point_x.end(),0LL)/corner_point_x.size();
                pixel[1] = std::accumulate(corner_point_y.begin(),corner_point_y.end(), 0LL)/corner_point_x.size();

                pixel[0] = corner_point_x[0];
                pixel[1] = corner_point_y[0];

                // compute the distance of the center of the tag
                dist_to_center = depth.get_distance(pixel[0], pixel[1]);

                std::cout<<"id: "<< ids[id_index] <<" distance: "<<dist_to_center<<std::endl;

                // obtain the 3D coordinate (x,y,z) of the center of the tag
                rs2_deproject_pixel_to_point(tag_point, &intrinsics, pixel, dist_to_center);
                std::cout<<"id: "<< ids[id_index] <<" 3D point: "<<tag_point[0]<<","<<tag_point[1]<<","<<tag_point[2]<<std::endl;
                id_index++;
            }




        }

        rs2::device sensor ;

        cv::imshow("out",image);
        char key = (char) cv::waitKey(1);
        if (key == 27)
            break;
    }

    return EXIT_SUCCESS;
}
catch (const rs2::error & e)
{
    std::cerr << "RealSense error calling " << e.get_failed_function() << "(" << e.get_failed_args() << "):\n    " << e.what() << std::endl;
    return EXIT_FAILURE;
}
catch (const std::exception& e)
{
    std::cerr << e.what() << std::endl;
    return EXIT_FAILURE;
}

@yuan0623
Copy link
Author

yuan0623 commented May 5, 2021

I have figured out why there is only 1 digit after the decimal point.
Here is what I have done:

        //rs2::frameset data = pipe.wait_for_frames().    // Wait for next set of frames from the camera
        //apply_filter(printer).     // Print each enabled stream frame rate
        //        apply_filter(color_map);   // Find and colorize the depth data

        rs2::frameset frames = pipe.wait_for_frames();

Basically I commented out the first 3 lines, replace them with rs2::frameset frames = pipe.wait_for_frames();

However, the distance value is still not correct. I'm working on that now.

@MartyG-RealSense
Copy link
Collaborator

Thanks very much @yuan0623 for the update about your progress with the decimal point. I look forward to hearing about your tests with the distance value. Good luck!

@MartyG-RealSense
Copy link
Collaborator

Hi @yuan0623 Do you have an update about this case that you can provide, please? Thanks!

@yuan0623
Copy link
Author

Hi @yuan0623 Do you have an update about this case that you can provide, please? Thanks!

I think my issue is not resolved yet. If you look at my code, it's really straightforward. First I detect the pixel coordinate using OpenCV library, then I use get_distance and rs2_deproject_pixel_to_point to obtain the 3D coordinate of that pixel in 3D space. But the result just makes no sense. I wonder, is the coordinate of the RGB aligned with the depth? Or is the coordinate of the OpenCV aligned with the depth?

@MartyG-RealSense
Copy link
Collaborator

When using rs2_deproject_pixel_to_point to deproject 2D pixels to 3D world points, a typical approach would be to (1) perform depth to color alignment of the depth and color streams, and then (2) use rs2_deproject_pixel_to_point to obtain the 3D coordinates.

This process is described in a discussion in the link below, which is about a RealSense Python user who was using alignment and rs2_deproject_pixel_to_point to find the center of an ArUco marker.

#3688

Another example of finding the center of an ArUco tag with Python takes an approach more similar to your own, where get_distance is used to find depth information corresponding to the center point of the marker, and then rs2_deproject_pixel_to_point is used (in conjunction with the camera's color frame intrinsics).

https://www.programmersought.com/article/31073681999/

@yuan0623
Copy link
Author

When using rs2_deproject_pixel_to_point to deproject 2D pixels to 3D world points, a typical approach would be to (1) perform depth to color alignment of the depth and color streams, and then (2) use rs2_deproject_pixel_to_point to obtain the 3D coordinates.

This process is described in a discussion in the link below, which is about a RealSense Python user who was using alignment and rs2_deproject_pixel_to_point to find the center of an ArUco marker.

#3688

Another example of finding the center of an ArUco tag with Python takes an approach more similar to your own, where get_distance is used to find depth information corresponding to the center point of the marker, and then rs2_deproject_pixel_to_point is used (in conjunction with the camera's color frame intrinsics).

https://www.programmersought.com/article/31073681999/

Hi Marty,

Thanks for your answer! After I align depth with RBG, I think now my results are reasonable! Finally!

@MartyG-RealSense
Copy link
Collaborator

Great news @yuan0623 about your improved results - thanks very much for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants