Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple depth map fusion #9832

Closed
chinmay5 opened this issue Oct 6, 2021 · 21 comments
Closed

Multiple depth map fusion #9832

chinmay5 opened this issue Oct 6, 2021 · 21 comments

Comments

@chinmay5
Copy link

chinmay5 commented Oct 6, 2021

  • Before opening a new issue, we wanted to provide you with some useful suggestions (Click "Preview" above for a better view):

  • All users are welcomed to report bugs, ask questions, suggest or request enhancements and generally feel free to open new issue, even if they haven't followed any of the suggestions above :)


Required Info
Camera Model { D435i }
Firmware Version (Open RealSense Viewer --> Click info)
Operating System & Version Ubuntu 18.04
Kernel Version (Linux Only) (e.g. 4.14.13)
Platform PC/Raspberry Pi/ NVIDIA Jetson / etc..
SDK Version { legacy / 2.. }
Language python3
Segment other

Issue Description

I want to perform 3D reconstruction using Depth Map fusion. For this, I would like to set up 4-8 D435i at the corners of a box and have a toy object in the centre, something like this image

image

However, I am pretty new to the field and although I could find ColMap doing the reconstruction using images, I would imagine having multiple depth sensors and fusion of depth maps would give better results. I am open to switching to pure image-based reconstruction as well. Any help would be really welcome.

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Oct 6, 2021

Hi @chinmay5 If you have the budget for a commercial software package, the RealSense-compatible RecFusion Pro 3D scanning software supports multiple camera capture.

The standard RecFusion edition for single cameras can build up an image in real-time as the camera is moved around.

https://www.recfusion.net/index.php/en/features

Another RealSense-compatible commercial software product is Dot3D, which has a Lite edition suited to scanning small objects and a Pro edition for capturing larger scenes.

https://www.dotproduct3d.com/dot3dlite.html
https://www.dotproduct3d.com/dot3dpro.html

Another commercial option is ItSeez3D.

https://itseez3d.com/itseez3d-sdk-for-intel-realsense-d4xx.html


A free alternative to commercial products that can be used with a single camera is the RealSense SDK's rs-kinfu (KinectFusion), which builds up a point-cloud depth image through frame fusion as the camera is moved around.

https://github.com/IntelRealSense/librealsense/tree/master/wrappers/opencv/kinfu

@chinmay5
Copy link
Author

chinmay5 commented Oct 7, 2021

Dear @MartyG-RealSense thank you for the quick response. I looked at the suggestions you provided and was wondering if there is also a python interface available implementation of RealSense SDK's rs-kinfu . As I said, in my use case, I would rather have multiple fixed sensors rather than having moving parts. Correct me if I am wrong, but can this be done with RealSense SDK's rs-kinfu or going for the commercial software packages the only viable solution?

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Oct 7, 2021

There is not an implementation of rs-kinfu for Python, unfortunately.

Another free option would be to use ROS and follow Intel's guides for using multiple cameras to capture individual point clouds from each camera and stitch them together into a single large cloud. There are guides for 2 cameras on the same computer and 3 cameras across 2 computers. You may be able to adapt the guides to add a greater number of cameras, but I cannot make guarantees about the results.

1 computer, 2 cameras
https://github.com/IntelRealSense/realsense-ros/wiki/Showcase-of-using-2-cameras

2 computers, 3 cameras
https://github.com/IntelRealSense/realsense-ros/wiki/showcase-of-using-3-cameras-in-2-machines


The CONIX Research Center at Carnegie Mellon developed a system for generating point clouds from up to 20 RealSense cameras and then stitching them together, though it is based on C++ rather than Python.

https://github.com/conix-center/pointcloud_stitching


If Python is your preference, it should be possible to stitch individual point clouds into a combined cloud using a process called an affine transform. This sets point clouds to a common position and rotation. An SDK instruction that provides an affine transform is rs2_transform_point_to_point, as described for Python in #5583

@chinmay5
Copy link
Author

chinmay5 commented Oct 8, 2021

Dear @MartyG-RealSense I was trying to get the python code to work until now but unfortunately, I am still struggling to get a good example with point cloud data being read from two different realsense cameras and then merged together. I tried working with the affine transform idea you gave but I have not managed to get things done till now. It would be great if you can point me to some example code.

@MartyG-RealSense
Copy link
Collaborator

The subject of affine transforms in Python is discussed in detail in #8333 though it does not have a pre-made Python script for stitching point clouds.

Below are some alternative approaches to using affine transform.


  1. Transformation Parameters' details in ICP Registration  #9590 describes how to use ICP registration with Python and Open3D to combine together a pair of separate point cloud files.

  2. Another approach to using Python with RealSense and Open3D to combine point cloud files is here:

isl-org/Open3D#362

  1. A Python example script for the PCL (Point Cloud Library) Python binding python-pcl demonstrates how to combine a pair of point clouds together using a technique called Pairwise Incremental Registration.

https://github.com/strawlab/python-pcl
https://github.com/strawlab/python-pcl/blob/master/examples/official/Registration/pairwise_incremental_registration.txt

@chinmay5
Copy link
Author

Dear @MartyG-RealSense I think I was able to get things to work with open3D. The plan would be to use multiple realsense sensors and save the images and depth maps in the same folder. Then, I can use the reconstruction system from open3d to try and reconstruct it.

The thing I am struggling with is, how to connect say 10 sensors and read data from them. As of now, I opened the realsense-viewer in order to get the id for each of the devices and then stream from them. However, I do not think this is the best way.

Can you please provide me with a clean python code reference to read images from up to 10 or so realsense cameras? I would be really thankful if the same is possible.

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Oct 11, 2021

I'm very pleased to hear that you were able to make positive progress!

A RealSense user created a multiple camera Python project called multiple_realsense_cameras.py in the link below.

https://github.com/ivomarvan/samples_and_experiments/tree/master/Multiple_realsense_cameras

@chinmay5
Copy link
Author

Hi @MartyG-RealSense thank for the link. It works like a charm. One more question though. As you can see, the object I am trying to scan is rather small while when I tried the open3D examples, the field of view was large and I could not get the model in centre.

image

As you can see, although the construction is not good (maybe it was because I was not very careful during capturing of images), it is looking at a much larger field of view than only the green object in the centre. I was wondering if limiting the depth map to say 30cm can be of use. Can you help in figuring out how to set the maximum depth? Also, any suggestions on improving the reconstruction is really really appreciated.

@chinmay5
Copy link
Author

@MartyG-RealSense I managed to clip the depth map using

for d in ctx.devices:
    serial_number = d.get_info(rs.camera_info.serial_number)
    name = d.get_info(rs.camera_info.name)
    # Attempts
    original_depth_table = rs.rs400_advanced_mode(d).get_depth_table()
    # Change the depth clamping to a range ensuring we do not go much beyond the immediate viscinity
    original_depth_table.depthClampMax = 500
    rs.rs400_advanced_mode(d).set_depth_table(original_depth_table)
    # print(dir(original_depth_table))

However, I am still struggling with the surface reconstruction. Any inputs from you would be highly appreciated.

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Oct 12, 2021

Thanks very much @chinmay5 for sharing your Python depth clamp script with the RealSense community!

If you would like to zoom into a small object then reducing the camera's depth unit scale from the default of '0.001' to 0.0001 may help. This principle is illustrated in #8228 where changing the depth unit scale enables zoom-in on a small toothpick and provides improvement in depth image quality.

#6230 (comment) has an example of Python code for setting the depth unit scale to 0.0001.

image

@chinmay5
Copy link
Author

Dear @MartyG-RealSense thank you so much for the response. I tried making the changes you suggested. Now that I reduced the maximum depth, I get somehwat worse reconstructions.

image

Not exactly sure if this is due to decreasing the max depth value and does the system become unstable at this particular resolution?

@MartyG-RealSense
Copy link
Collaborator

An alternartive way to clamp depth is to use a post-processing Threshold Filter that can define both a minimum and maximum distance. Depth values outside of that defined min-max range are excluded from the depth image. An example of a Python script for implementing a threshold filter is at #8170 (comment)

@MartyG-RealSense
Copy link
Collaborator

Hi @chinmay5 Do you require further assistance with this case, please? Thanks!

@chinmay5
Copy link
Author

Hi @MartyG-RealSense sorry for not responding quickly. I was trying to run some experiments using the Open3D library you recommended. The library is indeed helpful but I am getting stuck in fusing the depth maps for the entire object. I can reconstruct one view of the object by fusing its depth maps
image

However, as soon as I put all the views (up to 8 depth maps. The above figure used fusion of 3 depth maps), the thing breaks down completely.

image

The object is not closed although I take depth maps from the 8 possible views. If you have any ideas, that would be really really helpful.

Thanks :)

@MartyG-RealSense
Copy link
Collaborator

If you have 8 live cameras that are being processed simultaneously on the same computer then that would be a considerable processing burden on the computer's CPU. In the past, Intel have recommended a computer with a minimum of an i7 CPU for simultaneous use of 4 cameras on the same machine.

If you are unable to use a more powerful computer or divide the cameras among multiple computers, would it be practical to capture by cycling through the cameras, turning them on and off in sequence until you have captured all of the data that you require?

@chinmay5
Copy link
Author

An alternartive way to clamp depth is to use a post-processing Threshold Filter that can define both a minimum and maximum distance. Depth values outside of that defined min-max range are excluded from the depth image. An example of a Python script for implementing a threshold filter is at #8170 (comment)

I tried your suggestion. The major problem seems to be issues with the reconstruction because not enough depth information is included. When I reduce the depth unit scale from the default of '0.001' to 0.0001, the depth map becomes essentially all 0s ie no depth information.

Also, what would be the effect of increasing the depth map resolution to 1280 x 720, would that help me in getting these fine structured information?

Thank you for your time and patience :)

@MartyG-RealSense
Copy link
Collaborator

Increasing the resolution would likely increase the minimum distance of the camera, meaning that at close range the depth image would start breaking up sooner as the camera moves towards the observed object / surface (in other words, the camera could not get as close as before to an object).

The optimal depth accuracy resolution of the D435i model is 848x480 (it is 1280x720 on the D415 camera model), so depth measurement accuracy may reduce at 1280x720 on the D435i.

@chinmay5
Copy link
Author

Hi @MartyG-RealSense . Maybe we lost track of things in this long thread. I have a small object 20 cm by 20 cm approx placed at a distance of 40cm from the cameras. What would be the best configuration for achieving this?

Thanks

@MartyG-RealSense
Copy link
Collaborator

Setting the Medium Density camera preset should provide a good balance betwen accuracy and the amount of detail in the depth image (the 'fill rate').

To set the preset in your Python project, you could try the script provided by a RealSense team member in #2577 (comment)

In the line **if** visulpreset == **"High Accuracy"**: line, try changing High Accuracy in the quotation marks to:

**if** visulpreset == **"Medium Density"**:

@MartyG-RealSense
Copy link
Collaborator

Hi @chinmay5 Do you require further assistance with this case, please? Thanks!

@MartyG-RealSense
Copy link
Collaborator

Case closed due to no further comments received.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants