-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshotting a CDN that has an HTTPS delivery service w/ no cert causes TR crconfig reload failure #5893
Comments
There's a chicken/egg problem that you can't create the SSL cert until the DS is created I believe. |
The 3rd option is also kindof problematic because the CRConfig and SSL certificate retrieval aren't tied to one another. Certs are just autodeployed without a safety once they show up even if it's broken. |
true. but the api handler could do it just fine. on create, if protocol=https, call generate handler. |
That's basically advocating for option 2. |
I vote for both [1] or [2] (I don't have a strong opinion which), and [3]. Robust systems require multiple failsafes. I know we have limited resources, but config loading (on both TR and Caches) is a particularly dangerous part of our system. We should have validation in TO and/or TP, and TR and ATS should also detect and refuse to apply changes to a single DS that they recognize is invalid, but still apply changes for every other DS. This is especially necessary for Self-Service, to catch tenants who mash keys until something broken slips thru one validation. To make the CDN keep working and applying changes for every other tenant. On the UI Usability side, we should also fix the chicken-and-egg problem, so it's possible to fully create the DS at once, and not necessary to create temporary wrong config in order to get to the final config. |
I'm -1 on auto-generating a self-signed (unusable) certificate just to bypass this TR safety. I think we should just prevent enabling HTTPS on a delivery service unless it has a certificate. That means if we want HTTPS we still have to:
I don't think we should make TR skip over changes to DSes that it recognizes are invalid but still apply changes for every other DS. As a general principle, if TR gets an invalid CRConfig, I think it should continue to operate with the last known good CRConfig. We generally catch these issues very quickly anyways, and they are pretty much always harmless. If we allow TR to run with partially-valid CRConfigs, we will not be able to look at the CRConfig TR has on-disk and know exactly what a given DS's running config is. The CRConfig should be all or nothing, and that is generally how TR is designed under the hood. Changing that design would be costly and provide very little value IMO. I understand your point about multiple failsafes, but TR refusing to load an invalid snapshot is a failsafe. We just need the 2nd failsafe in TO to prevent the invalid snapshots in the first place. |
As an operator, I disagree. Fewer steps and ways to mess things up that don't involve being forced into longer explicit workflows are better. We're aiming for the same endgoal, but which is better:
versus
There's not a downside in defaulting to a self-signed value and is how most beginner webserver instructions start with. The worst case scenario is that you forget to add the valid cert later which is really just a tradeoff in https connectivity being refused all together versus just being able to allow the insecure content. |
I think the point is that you can have a more robust and scalable system by moving away from the all-or-nothing safety that's so pervasive in TR. If a goal someday ever is to get away from having a global CDN mutex (chicken), lots of people without communication/coordination with eachother will be needing to do their own micro-snapshots. |
TR's all or nothing safety has nothing to do with the ability for people to do their own "micro/DS snapshots" whenever that functionality gets implemented in the future. TO needs to have the validation in place to prevent invalid configs from even being possible in the first place. I don't really see reducing the number of steps involved as a necessary part of that solution, especially when it is hiding a necessary step in getting an actually secure certificate. We wouldn't want users to create a DS with HTTPS enabled or update their DS to enable HTTPS and think their DS is actually secure with nothing left to do. Providing a real cert is a necessary step in the process, so I don't think it matters that we continue to have a 4-step process with this solution. |
The 4-step process is symptomatic of TP's tendency to say "1 button = 1 API request", anyway. When creating a DS it could easily ask for certificates (or if you want them generated) at the same time, and do the 4 steps for you behind the scenes so that user encounters only one actual step. |
Relates to #2429 |
I'm submitting a ...
Traffic Control components affected ...
Current behavior:
TR will not load a snapshot that has an HTTPs delivery service with a missing cert, therefore, TR will be stuck with an old snapshot until the problem is resolved and a cert is created for the HTTPS ds or the HTTPS ds is switched to HTTP.
Expected behavior:
Prevent the creation of an invalid snapshot or TR should handle invalid snapshots more gracefully.
Minimal reproduction of the problem with instructions:
Anything else:
Possible solutions:
The text was updated successfully, but these errors were encountered: