Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfault in HiveDataSource::next() when preloading an empty split #8850

Conversation

majetideepak
Copy link
Collaborator

@majetideepak majetideepak commented Feb 26, 2024

We are seeing the segfault below in our benchmark cluster.
The segfault is due to the splitReader_ being accessed as a nullptr;
The splitReader_ is not initialized for an empty split in the preload path.
The previous splitReader_ is used and this is fixed.
We confirmed that this change fixes the segfault.

However, I do not have a test where the previous splitReader_ is a nullptr.
The addSplit() path, where splitReader_ is always created first, seems to be always taken first before
the preload path.
It is unclear how the splitReader_ can become a nullptr in practice.

*** SIGSEGV (@0x0) received by PID 55 (TID 0x7ef6a27fc700) from PID 0; stack trace: ***
@ 0x7f58305b776e google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f582cc1bd20 (unknown)
@ 0x6b0646 facebook::velox::connector::hive::HiveDataSource::next()
@ 0x3a6ebbc facebook::velox::exec::TableScan::getOutput()
@ 0x3953f3b facebook::velox::exec::Driver::runInternal()
@ 0x3956e2d facebook::velox::exec::Driver::run()
@ 0x3956fef _ZN5folly6detail8function14FunctionTraitsIFvvEE9callSmallIZN8facebook5velox4exec6Driver7enqueueESt10shared_ptrIS9_EEUlvE_EEvRNS1_4DataE
@ 0x7f582e83d4f6 folly::ThreadPoolExecutor::runTask()
@ 0x4087575 folly::CPUThreadPoolExecutor::threadRun()
@ 0x7f582e83f007 folly::detail::function::FunctionTraits<>::callSmall<>()
@ 0x7f582b875b23 (unknown)
@ 0x7f582cc111ca start_thread
@ 0x7f582ae7c8d3 __GI___clone
@ 0x0 (unknown)

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 26, 2024
Copy link

netlify bot commented Feb 26, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit b0be73f
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66003e24af05c100087fa2f5

@majetideepak majetideepak changed the title Fix segfault in HiveDataSource when preloaded Fix segfault in HiveDataSource when preloading an empty split Feb 26, 2024
@majetideepak majetideepak force-pushed the fix-preload-empty-split branch 2 times, most recently from 29de8a1 to ccc431f Compare February 26, 2024 08:47
@majetideepak majetideepak changed the title Fix segfault in HiveDataSource when preloading an empty split Fix segfault in HiveDataSource::next() when preloading an empty split Mar 7, 2024
@majetideepak
Copy link
Collaborator Author

majetideepak commented Mar 19, 2024

I am unable to add a reproducer. Even though the splitReader_ is not initialized for an empty split in the preload path, the splitReader_ can never be a nullptr since the addSplit() path where splitReader_ is always created seems to be always taken first before a preload.

It is unclear how the splitReader_ can end up becoming a nullptr in practice.

@majetideepak majetideepak force-pushed the fix-preload-empty-split branch 3 times, most recently from be1bcc1 to 0498c29 Compare March 21, 2024 21:48
@majetideepak majetideepak marked this pull request as ready for review March 21, 2024 22:05
@majetideepak majetideepak requested a review from Yuhta March 21, 2024 22:13
@majetideepak
Copy link
Collaborator Author

https://prestodb.slack.com/archives/C03DYLFBN06/p1709772605962749
Where another user confirmed this fix.

@facebook-github-bot
Copy link
Contributor

@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@majetideepak left a question. thanks!

@@ -201,8 +201,9 @@ std::optional<RowVectorPtr> HiveDataSource::next(
uint64_t size,
velox::ContinueFuture& /*future*/) {
VELOX_CHECK(split_ != nullptr, "No split to process. Call addSplit first.");
VELOX_CHECK(splitReader_ != nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: VELOX_CHECK_NOT_NULL

@@ -327,11 +328,8 @@ void HiveDataSource::setFromDataSource(
VELOX_CHECK(source, "Bad DataSource type");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NYC: VELOX_CHECK_NOT_NULL

@majetideepak majetideepak force-pushed the fix-preload-empty-split branch from 0498c29 to b0be73f Compare March 24, 2024 14:52
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@majetideepak LGTM. Thanks!

@facebook-github-bot
Copy link
Contributor

@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@Yuhta merged this pull request in 7a2996c.

Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
…facebookincubator#8850)

Summary:
We are seeing the segfault below in our benchmark cluster.
The segfault is due to the `splitReader_` being accessed as a nullptr;
The `splitReader_` is not initialized for an empty split in the preload path.
The previous `splitReader_` is used and this is fixed.
We confirmed that this change fixes the segfault.

However, I do not have a test where the previous `splitReader_` is a nullptr.
The addSplit() path, where `splitReader_` is always created first, seems to be always taken first before
the preload path.
It is unclear how the `splitReader_` can become a nullptr in practice.

*** SIGSEGV (0x0) received by PID 55 (TID 0x7ef6a27fc700) from PID 0; stack trace: ***
    @     0x7f58305b776e google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f582cc1bd20 (unknown)
    @           0x6b0646 facebook::velox::connector::hive::HiveDataSource::next()
    @          0x3a6ebbc facebook::velox::exec::TableScan::getOutput()
    @          0x3953f3b facebook::velox::exec::Driver::runInternal()
    @          0x3956e2d facebook::velox::exec::Driver::run()
    @          0x3956fef _ZN5folly6detail8function14FunctionTraitsIFvvEE9callSmallIZN8facebook5velox4exec6Driver7enqueueESt10shared_ptrIS9_EEUlvE_EEvRNS1_4DataE
    @     0x7f582e83d4f6 folly::ThreadPoolExecutor::runTask()
    @          0x4087575 folly::CPUThreadPoolExecutor::threadRun()
    @     0x7f582e83f007 folly::detail::function::FunctionTraits<>::callSmall<>()
    @     0x7f582b875b23 (unknown)
    @     0x7f582cc111ca start_thread
    @     0x7f582ae7c8d3 __GI___clone
    @                0x0 (unknown)

Pull Request resolved: facebookincubator#8850

Reviewed By: xiaoxmeng

Differential Revision: D55255299

Pulled By: Yuhta

fbshipit-source-id: ba09971faf517195d8a8ded692a9d31a758bdd1c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants