Add TTL for agents and advertised queues #294
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
When agents communicate with the server, we keep track of some information about them so that they can show up on the agents page. This is information such as the name of the agent, the queues it is listening to, the state it's currently in, etc. However, if an agent is removed or renamed, then we don't have any way to know that. All we know is that it hasn't been seen in a while. This is also true for "advertised queues". This sets a TTL in the database so that both of those types of entities will expire if they haven't been seen for a while (currently set at 7 days).
This should only happen if the device agent is removed, crashed (and systemd didn't restart it for some reason), or it has lost the ability to communicate with the server. If it's just offline, but the agent is still running, then it will continue to check-in and remind the server that it still exists
The end goal of this is so that when users look up the available agents on testfligner or in c3, then they should see ones that more accurately reflect what's really available.
For queues, we didn't always track the last time they were updated and they were typically only updated when the agent first comes online before - it won't change unless the agent is reconfigured and restarted. So I've added an updated_at field for the queues. For the ones that were already there, and still exist, they will get this updated_at field added whenever the agent restarts. However, if there are any advertised queues that no longer exist, then they will never have that field and won't be automatically deleted by the database. We'll need to hunt these down and manually remove them after the agents have all been updated to clean them up.
Resolved issues
CERTTF-331
Documentation
I added some more general details about concepts like agents polling the server rather than server pushing jobs to the agents, advertised queues, and the expiry of agents and queues that go away for too long to the existing concept pages that we have for agents and queues.
Web service API changes
N/A
Tests
Tested this locally by manually setting some dummy agents and queues (created by the create_sample_data script) to an older datestamp and waiting for the mongodb server to run its cleanup cycle. They were automatically removed within a few minutes of changing the datestamp.