Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TTL for agents and advertised queues #294

Merged
merged 2 commits into from
Jul 12, 2024

Conversation

plars
Copy link
Collaborator

@plars plars commented Jun 26, 2024

Description

When agents communicate with the server, we keep track of some information about them so that they can show up on the agents page. This is information such as the name of the agent, the queues it is listening to, the state it's currently in, etc. However, if an agent is removed or renamed, then we don't have any way to know that. All we know is that it hasn't been seen in a while. This is also true for "advertised queues". This sets a TTL in the database so that both of those types of entities will expire if they haven't been seen for a while (currently set at 7 days).

This should only happen if the device agent is removed, crashed (and systemd didn't restart it for some reason), or it has lost the ability to communicate with the server. If it's just offline, but the agent is still running, then it will continue to check-in and remind the server that it still exists

The end goal of this is so that when users look up the available agents on testfligner or in c3, then they should see ones that more accurately reflect what's really available.

For queues, we didn't always track the last time they were updated and they were typically only updated when the agent first comes online before - it won't change unless the agent is reconfigured and restarted. So I've added an updated_at field for the queues. For the ones that were already there, and still exist, they will get this updated_at field added whenever the agent restarts. However, if there are any advertised queues that no longer exist, then they will never have that field and won't be automatically deleted by the database. We'll need to hunt these down and manually remove them after the agents have all been updated to clean them up.

Resolved issues

CERTTF-331

Documentation

I added some more general details about concepts like agents polling the server rather than server pushing jobs to the agents, advertised queues, and the expiry of agents and queues that go away for too long to the existing concept pages that we have for agents and queues.

Web service API changes

N/A

Tests

Tested this locally by manually setting some dummy agents and queues (created by the create_sample_data script) to an older datestamp and waiting for the mongodb server to run its cleanup cycle. They were automatically removed within a few minutes of changing the datestamp.

@plars plars force-pushed the ttl-agents-and-advertised-queues branch from dcfa755 to c5e27d2 Compare June 26, 2024 21:20
@plars plars requested a review from a team June 26, 2024 21:24
Copy link
Contributor

@nancyc12 nancyc12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this small but useful change! It looks good to me.
I'd suggest that we also use datetime.now(timezone.utc) in agents_post() so that the Last updated on the Agent Detail page could also carry timezone info.

Copy link
Collaborator

@jocave jocave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, lgtm

@plars plars merged commit d6fd583 into main Jul 12, 2024
6 checks passed
@plars plars deleted the ttl-agents-and-advertised-queues branch July 12, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants