-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ds Active flag blueprint #4654
Ds Active flag blueprint #4654
Conversation
This is related to issue #3746 |
94d30be
to
50a33b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks very good to me. I guess the real question for the community is "is there a real need for 3 states?" If not, then:
-
active=true means the DS is actively routed thru TR AND the delivery service is accounted for in cache config files
-
active=false means the the DS is NOT actively routed thru TR AND the delivery service is NOT accounted for in cache config files
But assuming, the 3 states are desired, I see no problem with this design. The only suggestion I would have is potentially changing Active to Status (active, inactive, primed) as it makes a bit more sense to me but i suppose that would be messy.
I don't have a problem with changing the name of the field. But having discussed this with operators I do think three states are desired. |
I vote yes. We need to be able to send config to caches, before it's sent to the router (the existing "Inactive" option). Without it, there's no way to safely stop routing to a DS. So the existing behavior is absolutely necessary. But currently, there's no way to not send a DS to caches. This means potentially having clients, malicious or accidental, making requests that shouldn't be possible to make, by requesting directly to the cache. It's very common to not yet want to delete the information behind a DS, even when it's no longer active, or not yet active, or temporarily not active. The third "don't send to caches" state makes that possible. More than that, it's a Footgun. Because "inactive" sounds like it isn't being put anywhere. Unless the user has the TC docs memorized, it's a very easy mistake for new TC operators or tenants to make. Three unambiguous states eliminates that footgun. |
I'll also add, as above, a DS shouldn't be immediately removed from (or changed in) routers and caches, without some drain time period, as well as the reverse, making sure a DS is on all Caches before sending to Routers. In theory, this can be done by waiting long enough between Queue and Snap. But it's safer not to add risk to the already pernicious queue-snap order-of-operations. So, I think we should make it impossible to change a DS directly from inactive<->active. Both the Portal and the API should require being moved from Active to Primed, and then Primed to Inactive. Someone who needs to can always do it twice. But that makes it safer, and reduces the chance of operator misunderstanding. Ideally with good error messages, e.g. "Error: Inactive delivery services must be primed before they're set to active, to make sure they're on the caches before being routed to." |
So would that also mean you can never create an ACTIVE DS? |
Yes, +1 on requiring created DSes to be Inactive, or possibly Primed. But, I'd be in favor of removing the Active field from the DS creation form altogether. In theory, it's probably mostly ok to let someone make a DS as Active, because there are no successful clients anyway. But it's possible someone making a DS has clients already requesting, and it's possible they handle NXDOMAIN better than 404. For example, maybe they're configured to retry NXDOMAINs as errors, but 404's generally mean something's wrong with the URL so they give the client an error. Worse, maybe they cache that 404 locally. It's a bit of an edge case, I don't think it's a huge deal either way. But IMO making new DSes always start out Inactive adds an extra little bit of wrong-things-are-impossible safety. |
We should probably make it clear why this is necessary when it seems like we can already accomplish these 3 states today:
Granted, we're hoping to remove ds-server assignments in the future, so that seems to make this change necessary. |
That depends on what you mean by "accomplish". You can render Delivery Services un-service-able, but that's not exactly the same thing as having their configuration ready to deploy at the push of a button - one button. |
What does this PR (Pull Request) do?
Adds a blueprint for changing the "active" property of Delivery Services from a boolean to an enumerated string constant like:
For more details, refer to the blueprint.
Which Traffic Control components are affected by this PR?
None - blueprint.
What is the best way to verify this PR?
Read the blueprint.
The following criteria are ALL met by this PR