Delete expired data by job #57337

davidkyle · 2020-05-29T08:16:54Z

Deleting expired data can take a long time leading to timeouts if there are many jobs. Often the problem is due to a few large jobs which prevent the regular maintenance of the remaining jobs. This change adds a job_id parameter to the delete expired data endpoint to help clean up those problematic jobs.

This change only affects model snapshots and results. Forecasts cannot be removed by job_id yet if desired that could be implemented.

TODO HLRC

elasticmachine · 2020-05-29T08:16:56Z

Pinging @elastic/ml-core (:ml)

davidkyle · 2020-05-29T08:17:47Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/rest/RestDeleteExpiredDataAction.java

+        if (restRequest.hasContent()) {
+            request = DeleteExpiredDataAction.Request.PARSER.apply(restRequest.contentParser(), null);
+        } else {
+            request = new DeleteExpiredDataAction.Request();


requests_per_second and timeout can now be query parameters

davidkyle · 2020-05-29T08:24:34Z

...gin/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportDeleteExpiredDataAction.java

-            () -> deleteExpiredData(request, listener, isTimedOutSupplier)
-        );
+        jobConfigProvider.expandJobs(request.getJobId(), true, true, ActionListener.wrap(
+            jobBuilders -> {


This is the most controversial change I think. Previously each data remover would get all jobs using a BatchedJobsIterator which gets the jobs in batches of 10,000 using a scroll search so if there are more than 10,000 jobs a scroll search will return them all.

The config provider performs a normal search and cannot return more that 10,000 jobs. The 10,000 jobs is a known limit as GET jobs would never return more than that number. Using the config provider hugely simplifies the code but it is a change in behaviour not matter how unlikely it is that there are > 10,000 jobs

IMO, if there is > 10,000, the cleanup is not likely to finish. This seems like a simple throttle we get for free.

One thing to think about is that the 10,001st job would NEVER have it's data cleaned up. I suppose that is OK, but this should be at least documented. I agree that having more than 10k jobs is rare.

FWIW, the code using the iterator could be just as simple as you only have to update the iterator to restrict its search. Then instead of passing in a list of jobs, you pass in the iterator.

...lugin/core/src/main/java/org/elasticsearch/xpack/core/ml/action/DeleteExpiredDataAction.java

...rc/test/java/org/elasticsearch/xpack/core/ml/action/DeleteExpiredDataActionRequestTests.java

benwtrent · 2020-05-29T11:37:27Z

...gin/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportDeleteExpiredDataAction.java

-            () -> deleteExpiredData(request, listener, isTimedOutSupplier)
-        );
+        jobConfigProvider.expandJobs(request.getJobId(), true, true, ActionListener.wrap(
+            jobBuilders -> {


One thing to think about is that the 10,001st job would NEVER have it's data cleaned up. I suppose that is OK, but this should be at least documented. I agree that having more than 10k jobs is rare.

FWIW, the code using the iterator could be just as simple as you only have to update the iterator to restrict its search. Then instead of passing in a list of jobs, you pass in the iterator.

benwtrent · 2020-05-29T11:39:39Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/rest/RestDeleteExpiredDataAction.java

@@ -22,7 +24,8 @@

    @Override
    public List<Route> routes() {
-        return Collections.emptyList();
+        return Collections.singletonList(
+            new Route(DELETE, MachineLearning.BASE_PATH + "_delete_expired_data/{" + Fields.JOB_ID.getPreferredName() + "}"));


We need to make sure that we take the empty route in replacedRoutes and put it here before it is removed.

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/rest/RestDeleteExpiredDataAction.java

benwtrent · 2020-05-29T11:42:45Z

x-pack/plugin/src/test/resources/rest-api-spec/test/ml/delete_expired_data.yml

@@ -34,3 +56,96 @@ setup:
        body:  >
           { "timeout": "10h", "requests_per_second": 100000.0 }
  - match: { deleted: true}
+
+---
+"Test delete expired data with job id":


What happens if somebody calls _delete_expired_data with a job that does not exist?

davidkyle · 2020-06-05T09:06:48Z

run elasticsearch-ci/packaging-sample-matrix-windows

Deleting expired data can take a long time leading to timeouts if there are many jobs. Often the problem is due to a few large jobs which prevent the regular maintenance of the remaining jobs. This change adds a job_id parameter to the delete expired data endpoint to help clean up those problematic jobs.

For the ml delete expired data request changes in #57337

High level rest client changes for #57337

High level rest client changes for elastic#57337

High level rest client changes for #57337

davidkyle added >enhancement :ml Machine learning v8.0.0 v7.9.0 labels May 29, 2020

davidkyle commented May 29, 2020

View reviewed changes

davidkyle force-pushed the expire-by-job-id branch from 2883863 to bacdb1b Compare May 29, 2020 10:20

benwtrent reviewed May 29, 2020

View reviewed changes

davidkyle added 12 commits June 4, 2020 11:47

WIP

de290b8

Add jobs to job data remover

f4e8603

Update tests

7913f95

Rest test

0adc856

simplify test

f632d8a

remove unused code

6bb6917

Use volatile iterator

9c33d09

comment

2bf9703

Fix tests with security

1efe4f5

Review comments

9a7a1e6

Use doc IDs in unused state remover

da7ae6d

Add wildcard query to BatchedJobsIterator

ecda5c1

davidkyle force-pushed the expire-by-job-id branch from bacdb1b to ecda5c1 Compare June 4, 2020 13:18

Fix wrong arguments

45fa17c

benwtrent approved these changes Jun 4, 2020

View reviewed changes

Remove invalid test

46c1c84

davidkyle merged commit bbeda64 into elastic:master Jun 5, 2020

davidkyle deleted the expire-by-job-id branch June 5, 2020 12:32

This was referenced Jun 5, 2020

HLRC for delete expired data by job Id #57722

Merged

[7.x] Delete expired data by job (#57337) #57796

Merged

davidkyle mentioned this pull request Jun 8, 2020

Adjust version compatibility after backport #57805

Merged

davidkyle added a commit that referenced this pull request Jun 8, 2020

Adjust version compatiblity after backport (#57805)

5a2add9

For the ml delete expired data request changes in #57337

davidkyle mentioned this pull request Jun 9, 2020

Use Search After job iterators #57875

Merged

davidkyle added a commit that referenced this pull request Jun 11, 2020

HLRC for delete expired data by job Id (#57722)

bc1883b

High level rest client changes for #57337

davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Jun 12, 2020

HLRC for delete expired data by job Id (elastic#57722)

be2027f

High level rest client changes for elastic#57337

davidkyle added a commit that referenced this pull request Jun 12, 2020

HLRC for delete expired data by job Id (#57722) (#57975)

39020f3

High level rest client changes for #57337

russcam mentioned this pull request Jul 23, 2020

7.9.0 Meta ticket elastic/elasticsearch-net#4872

Closed

29 tasks

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete expired data by job #57337

Delete expired data by job #57337

davidkyle commented May 29, 2020 •

edited

Loading

elasticmachine commented May 29, 2020

davidkyle May 29, 2020

davidkyle May 29, 2020

benwtrent May 29, 2020

benwtrent May 29, 2020

benwtrent May 29, 2020

benwtrent May 29, 2020

benwtrent May 29, 2020

davidkyle commented Jun 5, 2020

Delete expired data by job #57337

Delete expired data by job #57337

Conversation

davidkyle commented May 29, 2020 • edited Loading

elasticmachine commented May 29, 2020

davidkyle May 29, 2020

Choose a reason for hiding this comment

davidkyle May 29, 2020

Choose a reason for hiding this comment

benwtrent May 29, 2020

Choose a reason for hiding this comment

benwtrent May 29, 2020

Choose a reason for hiding this comment

benwtrent May 29, 2020

Choose a reason for hiding this comment

benwtrent May 29, 2020

Choose a reason for hiding this comment

benwtrent May 29, 2020

Choose a reason for hiding this comment

davidkyle commented Jun 5, 2020

davidkyle commented May 29, 2020 •

edited

Loading