-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete expired data by job #57337
Delete expired data by job #57337
Conversation
Pinging @elastic/ml-core (:ml) |
if (restRequest.hasContent()) { | ||
request = DeleteExpiredDataAction.Request.PARSER.apply(restRequest.contentParser(), null); | ||
} else { | ||
request = new DeleteExpiredDataAction.Request(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requests_per_second and timeout can now be query parameters
() -> deleteExpiredData(request, listener, isTimedOutSupplier) | ||
); | ||
jobConfigProvider.expandJobs(request.getJobId(), true, true, ActionListener.wrap( | ||
jobBuilders -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the most controversial change I think. Previously each data remover would get all jobs using a BatchedJobsIterator
which gets the jobs in batches of 10,000 using a scroll search so if there are more than 10,000 jobs a scroll search will return them all.
The config provider performs a normal search and cannot return more that 10,000 jobs. The 10,000 jobs is a known limit as GET jobs would never return more than that number. Using the config provider hugely simplifies the code but it is a change in behaviour not matter how unlikely it is that there are > 10,000 jobs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, if there is > 10,000
, the cleanup is not likely to finish. This seems like a simple throttle we get for free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to think about is that the 10,001st job would NEVER have it's data cleaned up. I suppose that is OK, but this should be at least documented. I agree that having more than 10k jobs is rare.
FWIW, the code using the iterator could be just as simple as you only have to update the iterator to restrict its search. Then instead of passing in a list of jobs, you pass in the iterator.
2883863
to
bacdb1b
Compare
...lugin/core/src/main/java/org/elasticsearch/xpack/core/ml/action/DeleteExpiredDataAction.java
Outdated
Show resolved
Hide resolved
...lugin/core/src/main/java/org/elasticsearch/xpack/core/ml/action/DeleteExpiredDataAction.java
Outdated
Show resolved
Hide resolved
...rc/test/java/org/elasticsearch/xpack/core/ml/action/DeleteExpiredDataActionRequestTests.java
Outdated
Show resolved
Hide resolved
() -> deleteExpiredData(request, listener, isTimedOutSupplier) | ||
); | ||
jobConfigProvider.expandJobs(request.getJobId(), true, true, ActionListener.wrap( | ||
jobBuilders -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to think about is that the 10,001st job would NEVER have it's data cleaned up. I suppose that is OK, but this should be at least documented. I agree that having more than 10k jobs is rare.
FWIW, the code using the iterator could be just as simple as you only have to update the iterator to restrict its search. Then instead of passing in a list of jobs, you pass in the iterator.
@@ -22,7 +24,8 @@ | |||
|
|||
@Override | |||
public List<Route> routes() { | |||
return Collections.emptyList(); | |||
return Collections.singletonList( | |||
new Route(DELETE, MachineLearning.BASE_PATH + "_delete_expired_data/{" + Fields.JOB_ID.getPreferredName() + "}")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make sure that we take the empty route in replacedRoutes
and put it here before it is removed.
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/rest/RestDeleteExpiredDataAction.java
Outdated
Show resolved
Hide resolved
@@ -34,3 +56,96 @@ setup: | |||
body: > | |||
{ "timeout": "10h", "requests_per_second": 100000.0 } | |||
- match: { deleted: true} | |||
|
|||
--- | |||
"Test delete expired data with job id": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if somebody calls _delete_expired_data
with a job that does not exist?
bacdb1b
to
ecda5c1
Compare
run elasticsearch-ci/packaging-sample-matrix-windows |
Deleting expired data can take a long time leading to timeouts if there are many jobs. Often the problem is due to a few large jobs which prevent the regular maintenance of the remaining jobs. This change adds a job_id parameter to the delete expired data endpoint to help clean up those problematic jobs.
Deleting expired data can take a long time leading to timeouts if there are many jobs. Often the problem is due to a few large jobs which prevent the regular maintenance of the remaining jobs. This change adds a job_id parameter to the delete expired data endpoint to help clean up those problematic jobs.
For the ml delete expired data request changes in #57337
High level rest client changes for #57337
High level rest client changes for elastic#57337
Deleting expired data can take a long time leading to timeouts if there are many jobs. Often the problem is due to a few large jobs which prevent the regular maintenance of the remaining jobs. This change adds a
job_id
parameter to the delete expired data endpoint to help clean up those problematic jobs.This change only affects model snapshots and results. Forecasts cannot be removed by job_id yet if desired that could be implemented.
TODO HLRC