Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enable agent handling of stop requests #3539

Merged
merged 4 commits into from
Jan 19, 2024
Merged

Conversation

schoren
Copy link
Contributor

@schoren schoren commented Jan 18, 2024

This PR enables the agent to receive users "stop test" request so it can stop what it is doing and correctly flag the test run.

Architecture

The challenge for this feature is that the process that gets the users requests needs to stop a separated, independent process (for example, the trigger being executed). To solve this, this PR implements a map of context Cancel functions. Whenever a new process starts, it registers a new cancelFunc with a unique key based on the testID/runID. The stop listener notifies that channel when it gets a stop request, and the processor func can handle the cancelation.

When the control plane sends the agent TriggerResponse back to the controller, it checks if that response was cancelled by the user, if so it handles the case accordingly, emiting events, updating the run state, etc.

Workflow

sequenceDiagram
    participant U as User
    participant C as Controller(Core)
    participant TP as TestPipeline(Core)
    participant P as Publisher(NATS)
    participant CA as Cloud API
    participant A as Agent

    rect rgb(200, 200, 200)
    Note over A: Agent Startup Registration
    A->>+CA: Register Trigger/Poller/Etc stream
    A->>+CA: Register StopRequest handler stream
    end

    U->>C: Initiate StopRun
    C->>TP: Forward StopRun request
    TP->>P: Publish StopRun message
    P->>CA: Notify StopRun event
    CA->>A: Notify Agent via StopRequest stream
    A->>A: Stop running process (Internal)
    A->>CA: Send TriggerResponse with UserCancelled error
    CA->>C: Forward TriggerResponse to Controller
    C->>TP: If UserCancelled error, call UpdateStoppedTest
    TP->>TP: Set Run state, emit events (Internal)
Loading

Changes

Fixes

Checklist

  • tested locally
  • added new dependencies
  • updated the docs
  • added a test

Loom video

Working demo: https://www.loom.com/share/99f4bf306e284474bb5c37bf59feeda1?sid=18fea06a-de95-46fc-b3ab-bb605178997f
PR walkthrough: https://www.loom.com/share/30d16279d9c0491ebff17d252d5d2192?sid=b2a28755-3f21-4cef-803c-f1a281764c70

// TODO: get context from request
err = c.stopListener(context.Background(), &resp)
if err != nil {
fmt.Println(err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we print this message with a logger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the workers need to implement the logger so I'll create a new PR for this

Comment on lines +132 to +134
func (o *verboseObserver) StartStopRequest(request *proto.StopRequest) {
o.ui.Infof("%s Stopping test run %d, test %s ...", consoleUI.Emoji_Sparkles, request.RunID, request.TestID)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Comment on lines 413 to 420
uReq := UserRequest{
TenantID: job.TenantID,
TestID: job.Test.ID,
RunID: job.Run.ID,
}

q.subscriptor.Subscribe(uReq.ResourceID(UserRequestTypeStop), sfn)
q.subscriptor.Subscribe(uReq.ResourceID(UserRequestSkipTraceCollection), spfn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we improve the naming of the uReq, sfn and spfn variables? Looking at them at the Subscribe function they are a little bit confusing to understand.

Maybe we can rename to something like:

  • uReq -> userRequest
  • sfn -> stopFunctionCallback
  • spfn -> skipTraceCollectionCallback

Comment on lines +91 to +94
go func() {
err = w.trigger(subcontext, triggerRequest)
done <- true
}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest a short comment explaining that we moved this execution to a go routine because the user can cancel the test, and in that case, we will ignore the outcome of this go routine.

I'm thinking about this because I'm sure that we will look at this code in 6 months and think: "why did we do that?" 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make another PR implementing the same feature for the trace poller, so I'll try to make a nicer implementation there. If I cannot make it and end up copy/pasting, I'll add a comment on both!

@schoren schoren merged commit eac8b6d into main Jan 19, 2024
37 checks passed
@schoren schoren deleted the add-tenantid-to-stop-events branch January 19, 2024 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants