Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop Heartbeat monitor jobs on cancelation #20570

Merged
merged 1 commit into from
Aug 13, 2020

Conversation

jsoriano
Copy link
Member

If a monitor is stopped, for example when using autodiscover, the
scheduled tasks should be stopped too. Scheduler was rescheduling tasks
forever once started, though these tasks were not being executed because
they are also aware of the context.

This change avoids the execution and rescheduling of tasks once its job
context is done.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  • Start heartbeat with autodiscover enabled and -d scheduler.
  • Start some container/pod.
  • Wait for the monitor to be configured and executed.
  • Stop the container/pod.
  • Messages about jobs execution like the following one should eventually stop appearing:
2020-08-12T13:13:43.705+0200	DEBUG	[scheduler]	scheduler/scheduler.go:201	Job 'auto-http-0X2C5537D51C1B9524' returned at 2020-08-12 13:13:43.70518639 +0200 CEST m=+66.010433861

Related issues

@jsoriano jsoriano added review needs_backport PR is waiting to be backported to other branches. Team:obs-ds-hosted-services Label for the Observability Hosted Services team v7.10.0 labels Aug 12, 2020
@jsoriano jsoriano requested a review from andrewvc August 12, 2020 11:26
@jsoriano jsoriano requested a review from a team as a code owner August 12, 2020 11:26
@jsoriano jsoriano self-assigned this Aug 12, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/uptime (Team:Uptime)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Aug 12, 2020
If a monitor is stopped, for example when using autodiscover, the
scheduled tasks should be stopped too. Scheduler was rescheduling tasks
forever once started, though these tasks were not being executed because
they are also aware of the context.

This change avoids the execution and rescheduling of tasks once its job
context is done.
@jsoriano jsoriano force-pushed the heartbeat-stop-job branch from 75b3e62 to cab5285 Compare August 12, 2020 11:28
@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #20570 updated]

  • Start Time: 2020-08-12T11:29:50.990+0000

  • Duration: 49 min 28 sec

Test stats 🧪

Test Results
Failed 0
Passed 1200
Skipped 28
Total 1228

@andrewvc
Copy link
Contributor

Really interesting and good find @jsoriano ! I tried to add a test here, but I couldn't, because there was no externally visible thing to test. We are however missing cancellation tests. Would you mind adding this one below to this patch? This should be added to scheduler_test.go.

I guess it's kind of weird to add a test that doesn't really test the patch, but given that there's no good test to write, I think this will have to do. I suppose we could add more insight into the scheduler internals, but that feels like a weird design choice.

func TestCancellingJobs(t *testing.T) {
	s := NewWithLocation(10, monitoring.NewRegistry(), tarawaTime())

	require.NoError(t, s.Start())

	// Mutex to guard removeFn
	taskInitMtx := sync.Mutex{}
	// Let the job run once, then cancel it immediately
	taskInitMtx.Lock()
	var removeFn func ()
	timesRan := batomic.MakeInt(0)
	removeFn, err := s.Add(testSchedule{delay: 0}, "testCancel", func(ctx context.Context) []TaskFunc {
		timesRan.Inc()
		taskInitMtx.Lock()
		removeFn()
		taskInitMtx.Unlock()
		return nil
	})
	require.NoError(t, err)
	taskInitMtx.Unlock()

	// It's hard to tell if the job still exists since
	// we just recursively requeue them, but we should know after a second
	time.Sleep(time.Second)
	require.Equal(t, 1, timesRan.Load())

	require.NoError(t, s.Stop())
}

@jsoriano
Copy link
Member Author

@andrewvc yeah, I also couldn't find a way to test this. As you said we would need to expose scheduler or timer queue internals and can be weird. I thought that a way to do it could be to expose the length of the timer queue, but this is not reliable because the task is not in the queue while it is being executed.

Regarding the test for cancelation, wdyt about discussing it in a separate PR? I think it is always complicated to automatically test for things that are not expected to happen. In this case I am concerned by the sleep, I would prefer not having to add it.

Copy link
Contributor

@andrewvc andrewvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We can discuss the added test in a separate PR

@jsoriano jsoriano merged commit a6d98d6 into elastic:master Aug 13, 2020
@jsoriano jsoriano deleted the heartbeat-stop-job branch August 13, 2020 13:58
jsoriano added a commit to jsoriano/beats that referenced this pull request Aug 13, 2020
If a monitor is stopped, for example when using autodiscover, the
scheduled tasks should be stopped too. Scheduler was rescheduling tasks
forever once started, though these tasks were not being executed because
they are also aware of the context.

This change avoids the execution and rescheduling of tasks once its job
context is done.

(cherry picked from commit a6d98d6)
@jsoriano jsoriano removed the needs_backport PR is waiting to be backported to other branches. label Aug 13, 2020
jsoriano added a commit to jsoriano/beats that referenced this pull request Aug 13, 2020
If a monitor is stopped, for example when using autodiscover, the
scheduled tasks should be stopped too. Scheduler was rescheduling tasks
forever once started, though these tasks were not being executed because
they are also aware of the context.

This change avoids the execution and rescheduling of tasks once its job
context is done.

(cherry picked from commit a6d98d6)
jsoriano added a commit that referenced this pull request Aug 14, 2020
If a monitor is stopped, for example when using autodiscover, the
scheduled tasks should be stopped too. Scheduler was rescheduling tasks
forever once started, though these tasks were not being executed because
they are also aware of the context.

This change avoids the execution and rescheduling of tasks once its job
context is done.

(cherry picked from commit a6d98d6)
jsoriano added a commit that referenced this pull request Aug 14, 2020
If a monitor is stopped, for example when using autodiscover, the
scheduled tasks should be stopped too. Scheduler was rescheduling tasks
forever once started, though these tasks were not being executed because
they are also aware of the context.

This change avoids the execution and rescheduling of tasks once its job
context is done.

(cherry picked from commit a6d98d6)
melchiormoulin pushed a commit to melchiormoulin/beats that referenced this pull request Oct 14, 2020
If a monitor is stopped, for example when using autodiscover, the
scheduled tasks should be stopped too. Scheduler was rescheduling tasks
forever once started, though these tasks were not being executed because
they are also aware of the context.

This change avoids the execution and rescheduling of tasks once its job
context is done.
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…0588)

If a monitor is stopped, for example when using autodiscover, the
scheduled tasks should be stopped too. Scheduler was rescheduling tasks
forever once started, though these tasks were not being executed because
they are also aware of the context.

This change avoids the execution and rescheduling of tasks once its job
context is done.

(cherry picked from commit 5921705)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review Team:obs-ds-hosted-services Label for the Observability Hosted Services team v7.9.0 v7.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Autodiscover based on services doesn't stop monitors when service is deleted
3 participants