Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow pruning task #16533

Merged
merged 13 commits into from
Aug 7, 2024
Merged

Workflow pruning task #16533

merged 13 commits into from
Aug 7, 2024

Conversation

deanmarcussen
Copy link
Member

Fixes #11609

One of our guys, Jose, added this feature to our project some time ago, but left before he was able to contribute it back, so doing so on his behalf.

It is similar to the audit trail settings, however you can select a range of workflow statuses to prune.

image

Currently its part of the main workflows feature, and by default will be on, but on a site where the statuses have not been set, it will set them to all at 90 days when the background task runs.

Will let the community decide what behaviour they want there.

Perhaps it would be better to make it a feature.

@deanmarcussen deanmarcussen requested a review from Piedone August 7, 2024 04:38
Copy link
Member

@MikeAlhayek MikeAlhayek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deanmarcussen I directly committed into your branch to ensure the code in the PR following the standard we have been doing in OC. Also, I used some new extensions to help cleanup the code.

Just few minor requests.

@MikeAlhayek MikeAlhayek enabled auto-merge (squash) August 7, 2024 18:55
@MikeAlhayek MikeAlhayek merged commit c347a81 into main Aug 7, 2024
11 checks passed
@MikeAlhayek MikeAlhayek deleted the deanmarcussen/workflow-pruning branch August 7, 2024 19:00
Copy link
Member

@Piedone Piedone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bit too quick of a merge @MikeAlhayek. If there's some larger PR with multiple reviewers requested, please leave some time for the others too. I usually also @mention them to see if they have anything else to add. Just to make sure everyone's on the same page.

@deanmarcussen could you please open a second PR to address my remarks?

Comment on lines +24 to +25
@(Model.LastRunUtc.HasValue ? await DisplayAsync(await New.DateTime(Utc:
Model.LastRunUtc.Value, Format: "g")) : T["Never"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This middle of the argument line break is curious.

Suggested change
@(Model.LastRunUtc.HasValue ? await DisplayAsync(await New.DateTime(Utc:
Model.LastRunUtc.Value, Format: "g")) : T["Never"])
@(Model.LastRunUtc.HasValue
? await DisplayAsync(await New.DateTime(Utc: Model.LastRunUtc.Value, Format: "g"))
: T["Never"])

model.LastRunUtc = settings.LastRunUtc;
model.Disabled = settings.Disabled;
model.Statuses =
settings.Statuses ?? new WorkflowStatus[]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
settings.Statuses ?? new WorkflowStatus[]
settings.Statuses ??

Comment on lines +41 to +44
var prunedQty = await workflowCleanUpManager.PruneWorkflowInstancesAsync(
TimeSpan.FromDays(workflowPruningSettings.RetentionDays)
);
logger.LogDebug("Pruned {PrunedCount} workflow instances.", prunedQty);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var prunedQty = await workflowCleanUpManager.PruneWorkflowInstancesAsync(
TimeSpan.FromDays(workflowPruningSettings.RetentionDays)
);
logger.LogDebug("Pruned {PrunedCount} workflow instances.", prunedQty);
var prunedQuantity = await workflowCleanUpManager.PruneWorkflowInstancesAsync(
TimeSpan.FromDays(workflowPruningSettings.RetentionDays)
);
logger.LogDebug("Pruned {PrunedCount} workflow instances.", prunedQuantity);

);
logger.LogDebug("Pruned {PrunedCount} workflow instances.", prunedQty);

workflowPruningSettings.LastRunUtc = clock.UtcNow;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what in particular do you feel is unnecesary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is writing to a read-only object while the actual update happens a few lines below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note about the feature to the release 2.0 notes.


public async Task<int> PruneWorkflowInstancesAsync(TimeSpan retentionPeriod)
{
var dateThreshold = _clock.UtcNow.AddDays(1) - retentionPeriod;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why AddDays(1)? If the retention period is 3 days, and now is 8 AM on the 10th, then I'd expect those workflows to be deleted which are older or the same age as 8 AM on the 7th. This will only delete ones older than 8 AM on the 8th.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know our guy copied this from the audit trail module (because I cleaned up some left over audit trail comments when I pulled it out), so that logic comes directly from there, if its an issue, needs to be fixed in both places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so, but I don't understand why this logic would make sense.

Comment on lines +50 to +59
var workflowInstances = await _session
.Query<Workflow, WorkflowIndex>(x => x.WorkflowStatus.IsIn(statuses) && x.CreatedUtc <= dateThreshold)
.ListAsync();

var total = 0;
foreach (var item in workflowInstances)
{
_session.Delete(item);
total++;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about how this will fair when it's run the first time, with hundreds of thousands of items for one of our customer's site. I'll try it with a snapshot of the DB, but exporting it will take a day, so I'll only be able to report back tomorrow.

My guess is though that it'll time out and in such cases one would need to do manual pruning, gradually with smaller time frames, before letting the background task run here. This would be useful to document too, with a warning that with too many workflows your app may fail when this first runs.

I did a local test, with 260k generated simple dummy workflows with the below code in WorkflowController, using a local SQL Server, and it took less than 2s. That's really promising, though with a remote DB server this will surely be orders of magnitude slower.

        [Admin("Workflows/CreateTestWorkflows", "CreateTestWorkflows")]
        public async Task<ActionResult> CreateTestWorkflows()
        {
            var workflowType = await _session.GetAsync<WorkflowType>(61);

            for (int i = 0; i < 9999; i++)
            {
                var workflow = _workflowManager.NewWorkflow(workflowType);
                await _workflowStore.SaveAsync(workflow);
            }

            return Content("<html><body><script>location.reload();</script></body></html>", "text/html");
        }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, yeah we did batching on yessql some time ago, which kinda solved that.

It didn't cause us a problem when we released this code 6 months ago, I would hesitate a guess that we were killing of 100K+ workflows on that initial run, but we have reasonable performance on the boxes too.

May pay to limit it to say 10,000 per run, and they will clean up over a matter of days, rather than that original hit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we actually have batching! sebastienros/yessql#310 I totally forgot about this. The default page size for that is 500, so the 860k workflows I mentioned under #16447 should be less than 2000 queries. Even if these take like 5 ms each, which they shouldn't by far, this will only take 10s, which should be safe.

I'll still do a test with the real DB once the export is ready, to see if there are any surprises with some actual data. I'll report back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let me know what you find, and I'll get another pr going

Copy link
Member

@Piedone Piedone Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now checked, and it took 90s with a local SQL Server instance to delete ~870k Workflows. Apparently, my mock workflows were too simple to provide an accurate model. And it didn't succeed even locally, despite the connection timeout being larger in the connstring (I don't know actually what happens, because there's no exception, control just never goes to the next line after here).

So, the DB connection will definitely time out for this in prod (where it'll take at least 10x or 100x as long due to the DB server being not local but over the network).

Adding Take(50000) brings down the runtime of the deletion to ~4s, but it still gets stuck on LoadSiteSettingsAsync as mentioned above, 10k worked). So, limiting the number here can definitely be a workaround. I'd add an IOptions config and use something lower like 5k (which takes <1s) as the default. Then, we can advise people to increase this if their hosting environment allows, and to temporarily increase the background task frequency when they first deploy this to an existing site to get rid of the garbage faster (running this once every minute shouldn't be an issue for any proper server, really, the problem is just the timeout of doing too much at once)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW let it run with 10k to completion, it worked well.

@hishamco
Copy link
Member

hishamco commented Aug 7, 2024

I totally agree with Zoltan to have some time in PRs, but Mike don't want to make PRs take so long then they will be deactive or ignored over the time, which the same thing I am facing in some of my PRs


namespace OrchardCore.Workflows.WorkflowPruning.Services;

internal sealed class WorkflowStatusBuilder
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bad change, whoever added that.

Orchard is suposed to be an extensible framework, and I have far too much code we have to copy in from Orchard because someone decided to make something internal, and it needed tweaking


builder.Add(S["Configuration"], configuration => configuration
.Add(S["Settings"], settings => settings
.Add(S["Workflow Pruning"], S["Workflow Pruning"], pruning => pruning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to call this "pruning" vs "trimming" what we have for Audit Trail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely just a comment from Seb on the original issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think trimming would be better, not to have a new term for the same concept.

@Piedone
Copy link
Member

Piedone commented Aug 13, 2024

Will you open a new PR any time soon @deanmarcussen? Otherwise, I'll tackle the above feedback.

<div class="form-check">
<input type="checkbox" class="form-check-input workflow-status" id="@Html.IdFor(m => m.Statuses[i])"
name="@Html.NameFor(x => x.Statuses)" checked="@status[i].IsSelected" value="@status[i].Value">
<label class="form-check-label" for="@Html.IdFor(m => m.Statuses[i])">@T[status[i].Value]</label>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the meeting today: this doesn't allow the PO extractor to find all the strings, since it's not coming from a hard-coded string. Also, the WorkflowStatus values are technical identifiers, not necessarily suitable to be displayed to users directly.

So, better would be to list the statuses explicitly here.

@deanmarcussen
Copy link
Member Author

Will you open a new PR any time soon @deanmarcussen? Otherwise, I'll tackle the above feedback.

yeah where were we at with it @Piedone ?

is it just some formatting things, or where there concerns around functionality?

@Piedone
Copy link
Member

Piedone commented Aug 13, 2024

Well, yes, see the comments above.

@Piedone
Copy link
Member

Piedone commented Aug 15, 2024

Opened a PR for this: #16565.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Workflows need a trimming task
4 participants