-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
velox_functions_remote_client_test is failing in Linux release with adapters CI #11071
Comments
I suspect the fbthrift upgrade is causing the test to fail. |
This does look like the same issue we investigated for the C++20 standard update. This test started to SEGV consistently. We will have to see if this a consistent failure and if it is test only (like the issue from the C++20). Of note is that the folly upgrade PR did not run into the issue but a subsequent PR did. |
Because the new image build gets reflected on a subsequent build? The failure seems consistent on main. |
Stack
This is different than the C++20 stack that prompted us to update folly/fbthrift. |
If we remove the The new SEGV now occurs in the actual destructor of the created RemoteFunctionServiceHandler object in the test. It occurs after the std::string destructor for the
Something appears to be clobbering the object stack. |
Since this was working fine in the previous version of folly/fbthrift I investigated when the test ran into an issue. I determined that the last known good level of folly/fbthrift is
There might be subsequent levels that work fine again and a subsequent change leads to bad behavior. So we will be downgrading (but upgrading from the original level) from the currently used latest level to |
I suspect this is because with this build we link both Parquet and remote function support to the same binary, and one brings thrift as a dependency; the other brings fbthrift. I remember in the past having problems with this since they add symbols to the same namespace and end up confusing the linker and causing all sorts of weird crashes. I thought I had added a check somewhere, but we shouldn't allow customers to enable these options simultaneously. |
As a test, could you try to disable Parquet support in this build and run the remote function test again? |
@pedroerp One of the challenges now is that the Parquet->arrow->thrift dependency is installed as part of the build image. |
Ouch. Did you validate it isn't getting pulled in somehow using ldd and nm? |
Both Thrift and FBThrift were always linked statically. The order of linkage caused issues previously. The strange part is that it only happens with newer FBThrift versions as Chrisitan posted. |
It could be quite nondeterministic though. Maybe a symbol got renamed and that triggered a different behavior in the linker and exposed this issue. I remember when I was debugging this problem in Prestissimo, it was pretty hard to pinpoint what was happening. |
Right, we checked and saw that libthrift was linked last. This means it would only link in symbols from there if they hadn't been satisfied by the fbthrift libraries. So unless some symbol definition is missing from fbthrift no symbols from libthrift should be used. The interesting thing about this is that compiling with C++20 level and the latest fbthrift no problems occur. That is why we upgraded to the latest in the first place. Meanwhile, using C++20 on the previous old fbthrift level causes a SEGV too. So something gets tangled somehow (more details are in the C++20 PR). But I think there is one level that works for both (C++17 and C++20) which is one that is now targeted with PR #11104. This time we are testing with new container images that uses this level of FBOS. |
…#11018)" (facebookincubator#11079) Summary: This reverts commit 6d1fbf0. I validated locally that reverting this commit fixes the velox_functions_remote_client_test failure. Resolves facebookincubator#11071 Pull Request resolved: facebookincubator#11079 Reviewed By: kgpai Differential Revision: D63323399 Pulled By: pedroerp fbshipit-source-id: a239066f542e3b4867add87a5418c81b7fd3020f
Bug description
[==========] Running 6 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 6 tests from RemoteFunctionTestFixture/RemoteFunctionTest
[ RUN ] RemoteFunctionTestFixture/RemoteFunctionTest.simple/0
I0922 22:44:36.204250 39419 ThriftServer.cpp:836] Using thread manager (resource pools not enabled) on address/port /tmp/socketYuJayP: runtime: thriftFlagNotSet, , thrift flag: false, enable gflag: false, disable gflag: false
*** Aborted at 1727045076 (Unix time, try 'date -d @1727045076') ***
*** Signal 11 (SIGSEGV) (0x0) received by PID 39348 (pthread TID 0x7f2505a00640) (linux TID 39438) (code: 128), stack trace: ***
(error retrieving stack trace)
System information
CI: https://github.com/facebookincubator/velox/actions/runs/10984888387/job/30496015047
Relevant logs
No response
The text was updated successfully, but these errors were encountered: