Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving context.WithoutCancel outside defer. #17260

Merged
merged 4 commits into from
Feb 29, 2024
Merged

Conversation

getvictor
Copy link
Member

@getvictor getvictor commented Feb 28, 2024

#17197

  • Changes file added for user-visible changes in changes/ or orbit/changes/.
    See Changes files for more information.
  • Documented any permissions changes (docs/Using Fleet/manage-access.md)
  • Input data is properly validated, SELECT * is avoided, SQL injection is prevented (using placeholders for values in statements)
  • Added support on fleet's osquery simulator cmd/osquery-perf for new osquery data ingestion features.
  • Added/updated tests
  • If database migrations are included, checked table schema to confirm autoupdate
  • For database migrations:
    • Checked schema for all modified table for columns that will auto-update timestamps during migration.
    • Confirmed that updating the timestamps is acceptable, and will not cause unwanted side effects.
  • Manual QA for all new/changed functionality
    • For Orbit and Fleet Desktop changes:
      • Manual QA must be performed in the three main OSs, macOS, Windows and Linux.
      • Auto-update manual QA, from released version of component to new version (see tools/tuf/test).

Copy link

codecov bot commented Feb 28, 2024

Codecov Report

Attention: Patch coverage is 53.57143% with 13 lines in your changes are missing coverage. Please review.

Project coverage is 65.38%. Comparing base (456bc3c) to head (c60310b).
Report is 5 commits behind head on main.

Files Patch % Lines
server/service/live_queries.go 58.33% 9 Missing and 1 partial ⚠️
server/service/campaigns.go 25.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17260      +/-   ##
==========================================
- Coverage   65.39%   65.38%   -0.02%     
==========================================
  Files        1187     1187              
  Lines      106239   106250      +11     
  Branches     2546     2546              
==========================================
- Hits        69480    69475       -5     
- Misses      31477    31493      +16     
  Partials     5282     5282              
Flag Coverage Δ
backend 66.39% <53.57%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lucasmrod
Copy link
Member

The change makes sense to me. Is the reasoning the following?

func (svc *Service) GetCampaignReader(ctx context.Context, campaign *fleet.DistributedQueryCampaign) (<-chan interface{}, context.CancelFunc, error) {
  [...]
  campaign.Status = fleet.QueryRunning
  if err := svc.ds.SaveDistributedQueryCampaign(ctx, campaign); err != nil {

SaveDistributedQueryCampaign could have failed with e.g. context canceled / timeout but the transaction might have succeeded in MySQL?

@lucasmrod
Copy link
Member

It seems this change fixes the issue.

I can reproduce the issue in patch-fleet-v4.46.1 (~18 oprhaned queries out of 1000 ran with the below script) but not in victor/live-query-context:

#!/bin/bash

for i in {1..1000}; do
        curl -k -X POST -H "Authorization: Bearer $TEST_TOKEN" https://localhost:8080/api/latest/fleet/hosts/identifier/lucass-macbook-pro.local/query -d '{"query": "SELECT * FROM os_version;"}' &
        PROC=$!
        sleep 0.09
        kill $PROC
done

@xpkoala @getvictor feel free to test.

@getvictor getvictor marked this pull request as ready for review February 29, 2024 13:39
@getvictor getvictor requested a review from a team as a code owner February 29, 2024 13:39
lucasmrod
lucasmrod previously approved these changes Feb 29, 2024
err = svc.liveQueryStore.RunQuery(strconv.Itoa(int(campaign.ID)), queryString, hostIDs)
if err != nil {
return nil, ctxerr.Wrap(ctx, err, "run query")
}
Copy link
Member

@lucasmrod lucasmrod Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the bugs I reproduced was caused by context being canceled when executing svc.ds. CountHostsInTargets (which meant leaving a live query orphaned in Redis). So as a simple fix to avoid refactoring given the available time I'm moving this down.

PS: Such live query was removed from Redis after running cleanups_then_aggregation followed by frequent_cleanups. But it's probably not optimal as it can be left orphaned for 1 hour and 15 minutes or so.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucasmrod Any predicted side effects to this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I can think of. (The swaped operations are independent.)

Copy link
Member Author

@getvictor getvictor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@sharon-fdm sharon-fdm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lucasmrod lucasmrod merged commit 1a679d0 into main Feb 29, 2024
21 checks passed
@lucasmrod lucasmrod deleted the victor/live-query-context branch February 29, 2024 16:39
sharon-fdm pushed a commit that referenced this pull request Feb 29, 2024
#17197

Fixing orphaned live queries when context is canceled

Co-authored-by: Lucas Rodriguez <lucas@fleetdm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants