Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throwing an instance of 'librealsense::linux_backend_exception' #12140

Closed
shintaro-matsui opened this issue Aug 28, 2023 · 14 comments
Closed

Throwing an instance of 'librealsense::linux_backend_exception' #12140

shintaro-matsui opened this issue Aug 28, 2023 · 14 comments

Comments

@shintaro-matsui
Copy link

Required Info
Camera Model D415
Firmware Version 5.15.0.2
Operating System & Version Ubuntu 22.04
Kernel Version (Linux Only) 5.19.0-50-generic
Platform PC
SDK Version 2.54.1
Language C++
Segment Robot

Issue Description

I use D415 to acquire an image once every 2 seconds and process the image in my program.
When running that program, the program terminates with the following error about 1 time in 1000 times.

terminate called after throwing an instance of 'librealsense::linux_backend_exception'
  what():  lockf(...) failed Last Error: Bad file descriptor
Segmentation fault (core dumped) 

I dumped the core file and analyzed it using gdb and the result is as follows.

Core was generated by `MY_PROGRAM_NAME'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140378762098240) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: no such file or directory.

The result of back-trace with gdb is as follows.

(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140378762098240)
    at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140378762098240)
    at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140378762098240, signo=signo@entry=6)
    at ./nptl/pthread_kill.c:89
#3  0x00007fac82842476 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/posix/raise.c:26
#4  0x00007fac828287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007fac82ca2b9e in  () at /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007fac82cae20c in  () at /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007fac82cad1e9 in  () at /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007fac82cad959 in __gxx_personality_v0 ()
    at /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007fac8de1a884 in  () at /lib/x86_64-linux-gnu/libgcc_s.so.1
#10 0x00007fac8de1b2dd in _Unwind_Resume ()
    at /lib/x86_64-linux-gnu/libgcc_s.so.1
#11 0x00007fac8d7c1db2 in __gthread_mutex_unlock (__mutex=<optimized out>)
    at /usr/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:779
#12 std::mutex::unlock() (this=<optimized out>, this=<optimized out>)
    at /usr/include/c++/11/bits/std_mutex.h:118
#13 std::lock_guard<std::mutex>::~lock_guard()
    (this=<optimized out>, this=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging--
    at /usr/include/c++/11/bits/std_mutex.h:235
#14 librealsense::platform::named_mutex::unlock() (this=0x556268a34d30)
    at ./src/linux/backend-v4l2.cpp:191
#15 0x00007fac8d8c09c0 in librealsense::platform::multi_pins_uvc_device::unlock() const (this=<optimized out>) at ./src/backend.h:777
#16 0x00007fac8dc49e22 in std::lock_guard<librealsense::platform::uvc_device>::~lock_guard() (this=<synthetic pointer>, this=<optimized out>)
    at /usr/include/c++/11/bits/std_mutex.h:235
#17 librealsense::locked_transfer::send_receive(std::vector<unsigned char, std::allocator<unsigned char> > const&, int, bool)::{lambda(librealsense::platform::uvc_device&)#2}::operator()(librealsense::platform::uvc_device&) const
    (dev=..., __closure=<synthetic pointer>) at ./src/hw-monitor.h:219
#18 librealsense::uvc_sensor::invoke_powered<librealsense::locked_transfer::send_receive(std::vector<unsigned char, std::allocator<unsigned char> > const&, int, bool)::{lambda(librealsense::platform::uvc_device&)#2}>(librealsense::locked_transfer::send_receive(std::vector<unsigned char, std::allocator<unsigned char> > const&, int, bool)::{lambda(librealsense::platform::uvc_device&)#2})
    (action=..., this=0x556268b0a890) at ./src/sensor.h:361
#19 librealsense::locked_transfer::send_receive(std::vector<unsigned char, std::allocator<unsigned char> > const&, int, bool) [clone .constprop.0]
    (this=0x556268a34570, data=std::vector of length 24, capacity 24 = {...}, require_response=require_response@entry=true, timeout_ms=5000)
    at ./src/hw-monitor.h:219
--Type <RET> for more, q to quit, c to continue without paging--
#20 0x00007fac8db6763a in librealsense::hw_monitor::execute_usb_command(unsigned char*, unsigned long, unsigned int&, unsigned char*, unsigned long&, bool) const
    (this=0x556268a683a0, out=<optimized out>, outSize=<optimized out>, op=@0x7fac7a3f8de4: 0, in=0x7fac7a3f8df0 "\001", inSize=@0x7fac7a3f8de8: 1024, require_response=true) at /usr/include/c++/11/bits/shared_ptr_base.h:1295
#21 0x00007fac8db67840 in librealsense::hw_monitor::send_hw_monitor_command(librealsense::hw_monitor::hwmon_cmd_details&) const
    (this=this@entry=0x556268a683a0, details=...)
    at /usr/include/c++/11/array:257
#22 0x00007fac8db67c04 in librealsense::hw_monitor::send(librealsense::command, librealsense::hwmon_response*, bool) const
    (this=0x556268a683a0, cmd=..., p_response=0x0, locked_transfer=<optimized out>) at ./src/hw-monitor.cpp:177
#23 0x00007fac8d8e9a20 in librealsense::d400_device::get_device_time_ms()
    (this=<optimized out>) at ./src/ds/d400/d400-device.cpp:1179
#24 0x00007fac8db60c69 in librealsense::time_diff_keeper::update_diff_time()
    (this=0x556268a46ed0) at ./src/global_timestamp_reader.cpp:209
#25 0x00007fac8db60dda in librealsense::time_diff_keeper::polling(dispatcher::cancellable_timer) (this=0x556268a46ed0, cancellable_timer=...)
    at ./src/global_timestamp_reader.cpp:250
#26 0x00007fac8d9734a3 in std::function<void (dispatcher::cancellable_timer)>::operator()(dispatcher::cancellable_timer) const
--Type <RET> for more, q to quit, c to continue without paging--
    (__args#0=..., this=<optimized out>)
    at /usr/include/c++/11/bits/std_function.h:590
#27 active_object<std::function<void (dispatcher::cancellable_timer)> >::do_loop()::{lambda(dispatcher::cancellable_timer)#1}::operator()(dispatcher::cancellable_timer) const (ct=..., __closure=0x7fac7a3fa770)
    at ./third-party/rsutils/include/rsutils/concurrency/concurrency.h:444
#28 std::__invoke_impl<void, active_object<std::function<void (dispatcher::cancellable_timer)> >::do_loop()::{lambda(dispatcher::cancellable_timer)#1}&, dispatcher::cancellable_timer>(std::__invoke_other, active_object<std::function<void (dispatcher::cancellable_timer)> >::do_loop()::{lambda(dispatcher::cancellable_timer)#1}&, dispatcher::cancellable_timer&&) (__f=...)
    at /usr/include/c++/11/bits/invoke.h:61
#29 std::__invoke_r<void, active_object<std::function<void (dispatcher::cancellable_timer)> >::do_loop()::{lambda(dispatcher::cancellable_timer)#1}&, dispatcher::cancellable_timer>(active_object<std::function<void (dispatcher::cancellable_timer)> >::do_loop()::{lambda(dispatcher::cancellable_timer)#1}&, dispatcher::cancellable_timer&&) (__fn=...) at /usr/include/c++/11/bits/invoke.h:111
#30 std::_Function_handler<void (dispatcher::cancellable_timer), active_object<std::function<void (dispatcher::cancellable_timer)> >::do_loop()::{lambda(dispatcher::cancellable_timer)#1}>::_M_invoke(std::_Any_data const&, dispatcher::cancellable_timer&&) (__functor=..., __args#0=<optimized out>)
    at /usr/include/c++/11/bits/std_function.h:290
#31 0x00007fac8dbf16a5 in dispatcher::dispatcher(unsigned int, std::function<voi--Type <RET> for more, q to quit, c to continue without paging--
d (std::function<void (dispatcher::cancellable_timer const&)>)>)::{lambda()#1}::operator()() const () at /usr/include/c++/11/bits/std_function.h:590
#32 0x00007fac82cdc253 in  () at /lib/x86_64-linux-gnu/libstdc++.so.6
#33 0x00007fac82894b43 in start_thread (arg=<optimized out>)
    at ./nptl/pthread_create.c:442
#34 0x00007fac82926a00 in clone3 ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

The program that causes this phenomenon is too complex and large to post here.
Does anyone have any idea what the cause might be?

@dmipx
Copy link
Contributor

dmipx commented Aug 28, 2023

Hi.
Can you share kernel log when that event happens?
dmesg > kernel.log
Do you disconnect camera in idle periods to conserve power?

@shintaro-matsui
Copy link
Author

shintaro-matsui commented Aug 28, 2023

Hi.
Uploads the kernel.log when events occur.
kernel.log

Camera is not disconnected even when idle.

Additional info.
I am using two D415s, I have two processes up and running that take an image every 2 seconds and process the image.

@MartyG-RealSense
Copy link
Collaborator

Hi @shintaro-matsui I would also add that kernel 5.19 is not yet supported by librealsense at the time of writing this. 5.15 is the current most recently supported kernel. The SDK can be used with non-supported kernels but there may be unpredictable consequences in regards to stability.

If librealsense is built from source code with CMake then including the flag -DFORCE_RSUSB_BACKEND=TRUE in the CMake build instruction will cause the SDK to bypass the kernel and so avoid that instability, as an RSUSB build is not dependent on Linux versions or kernel versions and does not need to have a kernel patch script applied.

If you are using C++ then a script at #2219 (comment) for capturing an image simultaneously from all attached D415 cameras may be a helpful reference if you have not seen it already.

@dmipx I believe that your PR for 5.19 support at #11837 was merged into the in-progress development version of librealsense on May 25 2023 but may have just missed the release of librealsense 2.54.1 and so will be incorporated into the next SDK release after 2.54.1?

@shintaro-matsui
Copy link
Author

Hi, @MartyG-RealSense
I am going to try the cmake option -DFORCE_RSUSB_BACKEND=true.
And I will report the result.

@dmipx
Copy link
Contributor

dmipx commented Aug 28, 2023

@MartyG-RealSense We will support 5.19 and 6.2 kernels in oncoming release.

@shintaro-matsui I see no special issues on kernel log, only uvcvideo 2-8.3:1.1: Failed to resubmit video URB (-1) that can cause to frame drop but not camera loss.
Can you try development branch and see if same issue occurs? We did much stability improvements since.

@shintaro-matsui
Copy link
Author

Thanks for all the info.

I use librealsense installed with apt install librealsense2-dev currently.
I will try to install from source code in the development branch, and also try the option -DFORCE_RSUSB_BACKEND=true.

If that does not work, I will change to a machine with a different kernel and try again.
It will take some time to report the results.

@MartyG-RealSense
Copy link
Collaborator

It's no problem at all. Please do update here when you have news to report. Good luck!

@shintaro-matsui
Copy link
Author

shintaro-matsui commented Aug 31, 2023

Hi @MartyG-RealSense.
First I removed the existing librealsense2.54.
sudo apt remove --purge librealsense2*

Next, I installed the development branch from the source code.
-DFORCE_RSUSB_BACKEND=TRUE flag was not used.

Then I deleted the build directory of my program and tried to build again, but I get the following error.

/usr/bin/ld: warning: librealsense2.so.2.54, needed by /usr/local/lib/libpcl_io.so, not found (try using -rpath or -rpath-link)

Is this a bug on the librealsense2.55?
Or did I not delete my librealsense2.54 enough?

@MartyG-RealSense
Copy link
Collaborator

The instruction below should ensure that all RealSense-related packages are removed.

dpkg -l | grep "realsense" | cut -d " " -f 3 | xargs sudo dpkg --purge

@shintaro-matsui
Copy link
Author

Hi.
While trying different versions of librealsense, for some reason CUDA stopped functioning properly and my PC stopped working after CUDA failed to reinstall.

I decided to reinstall Ubuntu, but I heard that the kernel version was important, so I decided to install Ubuntu 20.04.
My current environment is as follows.

Required Info
Camera Model D415
Firmware Version 5.15.0.2
Operating System & Version Ubuntu 20.04.06 LTS
Kernel Version (Linux Only) 5.15.0-82-generic
Platform PC
SDK Version 2.54.1
Language C++
Segment Robot

However, I still get the same error.
Errors occur with the same frequency, about once every 1000 times.

terminate called after throwing an instance of 'librealsense::linux_backend_exception'
  what():  lockf(...) failed Last Error: Bad file descriptor
Segmentation fault (core dumped) 

librealsense2 was installed by sudo apt install librealsense2-dev librealsense2-utils librealsense2-dbg.
Any advice?

@MartyG-RealSense
Copy link
Collaborator

Your apt install instruction does not include the most important package librealsense2-dkms which is the core of librealsense. librealsense2-utils is the second most important as it installs the SDK's tools and examples.

librealsense2-dev and librealsense2-dbg are optional developer and debug packages that do not need to be installed in order for librealsense to work.

Please repeat the package uninstall instruction and then try this installation instruction:

sudo apt-get install librealsense2-dkms librealsense2-utils


CUDA requires an Nvidia graphics GPU and the librealsense SDK's support of it is intended for use with Nvidia Jetson single-board computers rather than a PC. For this reason, the packages on the distribution_linux.md installation instruction page do not contain CUDA support. Jetson boards have their own separate packages that contain CUDA support.

https://github.com/IntelRealSense/librealsense/blob/master/doc/installation_jetson.md

@shintaro-matsui
Copy link
Author

I understand that librealsense2-dkms is important.
I will report back when there is more progress.

@shintaro-matsui
Copy link
Author

Hi. @dmipx, @MartyG-RealSense.
This error no longer occurs.

First, I installed librealsense from the source code with the v2.54.1 branch and tried -DFORCE_RSUSB_BACKEND=true in the following environment that I reported previously, but it did not solve the problem.
#12140 (comment)
So I reinstalled with -DFORCE_RSUSB_BACKEND=false.

Next, I changed the implementation of the TCP/IP communication, and inexplicably this solved the problem.

In my program, the process that uses RealSense is the client and communicates with other processes.
Previously, an instance of client communication was created each time.
Here is a simple code to explain.

while(){

    // Generate client instance
    // Clinet connects to server

    // Fetch the realsense frame
    // Point cloud processing

    // Communicate to server, send and receive

    // Close the client
}

I changed it to keep an instance of the client communication and it no longer occurs.

// Generate client instance
// Clinet connects to server
while(){

    // Fetch the realsense frame
    // Point cloud processing

    // Communicate to server, send and receive

}
// Close the client

I report it, I don't know if this will help others.
Thank you for your help in discussing this.

@MartyG-RealSense
Copy link
Collaborator

You are very welcome. It's great to hear that you achieved a solution. Thanks very much for the update and for sharing the details of your solution!

As you have solved your issue, I will close this case. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants