-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Control % of requests faults are applied to with HTTP headers #10724
Control % of requests faults are applied to with HTTP headers #10724
Conversation
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks really nice work (as usual!). Just a few small comments. Thank you!
/wait
docs/root/intro/version_history.rst
Outdated
@@ -28,6 +28,7 @@ Version history | |||
* ext_authz: disabled the use of lowercase string matcher for headers matching in HTTP-based `ext_authz`. | |||
Can be reverted temporarily by setting runtime feature `envoy.reloadable_features.ext_authz_http_service_enable_case_sensitive_string_matcher` to false. | |||
* fault: added support for controlling abort faults with :ref:`HTTP header fault configuration <config_http_filters_fault_injection_http_header>` to the HTTP fault filter. | |||
* fault: added support for controlling the percentage of requests that abort, delay and response rate limits faults are applied to using :ref:`HTTP headers <config_http_filters_fault_injection_http_header>` to the HTTP fault filter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a master merge and the move to the new "current" version history file.
@@ -98,7 +125,7 @@ fault.http.abort.abort_percent | |||
<envoy_api_field_config.filter.http.fault.v2.HTTPFault.abort>`. | |||
|
|||
fault.http.abort.http_status | |||
HTTP status code that will be used as the of requests that will be | |||
HTTP status code that will be used as the of requests that will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not your bug but this sentence doesn't make sense. Can you fix?
const auto request_delay = fault_settings_->requestDelay(); | ||
if (request_delay == nullptr) { | ||
return false; | ||
} | ||
|
||
const auto percentage_header = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a perf perspective it would be optimal to not do this lookup unless we are actually using a header provider (similar elsewhere), especially because right now this is going to be an O(N) scan to find the header. Can you figure out a way to only do the lookup on demand? Perhaps by passing the entire map and initializing the header provider with the header to look for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I updated all headers providers to work like this - we pass headers map and we do a specific header lookup only if a header provider is used. After my recent changes, there should be no performance impact related to header lookup at all if fault filter is not configured to use header faults.
One more question regarding the implementation of the percentage header providers. As it is now, we try to read a percentage header (such as x-envoy-fault-abort-request-percentage
) and if it's not present or its value is invalid we fallback to the value that's specified in proto config file. Do you think that falling back to 100%
would make more sense (as opposed to looking at proto config file)?
envoy/source/extensions/filters/common/fault/fault_config.cc
Lines 14 to 37 in ed6fa7f
envoy::type::v3::FractionalPercent | |
HeaderPercentageProvider::percentage(const Http::RequestHeaderMap* request_headers) const { | |
if (!request_headers) { | |
return percentage_; | |
} | |
const auto header = request_headers->get(header_name_); | |
if (header == nullptr) { | |
return percentage_; | |
} | |
uint64_t header_numerator; | |
if (!absl::SimpleAtoi(header->value().getStringView(), &header_numerator)) { | |
return percentage_; | |
} | |
if (header_numerator > 100) { | |
return percentage_; | |
} | |
envoy::type::v3::FractionalPercent header_percentage; | |
header_percentage.set_numerator(header_numerator); | |
return header_percentage; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that falling back to 100% would make more sense (as opposed to looking at proto config file)?
I think falling back to the config is more flexible, as the operator can configure a default of 100%, 0%, etc. if they want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my thought too 👍 Leaving it as it is then.
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
…tp-headers Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
/azp run envoy-presubmit |
Azure Pipelines successfully started running 1 pipeline(s). |
|
CI is broken right now. We are trying to figure out how to fix it. |
Please merge master. /wait |
…tp-headers Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
I think that 44eedc7 broke some of the git pre-push checks.
|
Fixing spell checking in #10763 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, a few more comments.
/wait
return percentage_; | ||
} | ||
|
||
if (header_numerator > 100) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry just noticed this. Fractional percent supports 1/100, 1/10,000, etc. Do we want to support this somehow in a header? I'm wondering if this should somehow be scaled by the default percentage that is configured? WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could definitely use a denominator of a configured default percentage - that could work. If we do this we could also have a logic that would compute an output percentage's numerator as the minimum of two values - the value of HTTP percentage header and the value defined in default configured percentage.
Why would this additional complexity be helpful?
- That would be an additional safety mechanism that would allow Envoy users to define the maximum percentage of requests a given type of fault should be applied to.
- It would ensure that no change in the configuration of a default percentage results in too many faults being ingested via HTTP headers. Example when the lack of this mechanism could lead to unintended behavior is presented below.
Example
Step 1
default percentage configuration:
numerator: 100
denumerator: 1000
Effective fault injection rate is 10% since 100/1000 = 10.
HTTP headers used by clients injecting faults via HTTP headers:
`x-fault-abort-request-abort`: 404
`x-fault-abort-request-abort-percentage`: 100
Effective fault injection rate is 10% since 100/1000 = 10.
Step 2
Somebody on the server side decides to change a default percentage configuration to:
numerator: 10
denumerator: 100
From their perspective there should be no negative results of this change since the effective failure rate doesn't change (it's still equal to 10% since 100/1000 = 10/100 = 10)
The person responsible for a change doesn't update clients that use HTTP headers to ingest faults.
What happens with clients using HTTP headers to inject faults?
Without proposed min logic for the calculation of a numerator
`x-fault-abort-request-abort`: 404
`x-fault-abort-request-abort-percentage`: 100
Effective fault injection rate increases to 100% since 100/100 (new denominator of a default percentage) = 100.
With proposed min logic for the calculation of a numerator
`x-fault-abort-request-abort`: 404
`x-fault-abort-request-abort-percentage`: 100
Effective fault injection rate is still at 100% since min(100/100, 10 [Default Percentage Effective Rate]) = 100.
Let me know what you think about the proposed additional to your proposed change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the super detailed proposal! I agree the capping makes sense, but please make sure it's well documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion! Implemented here https://github.com/envoyproxy/envoy/pull/10724/files#diff-186e4ff5610d03e9dc6f8fcace2aa4a3R26-R29.
if (!request_headers) { | ||
return absl::nullopt; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this can happen? Same elsehwere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will never happen. I added this a safety measure in case we start using this class in more places in the future. That being said, I can remove this check if this kind of defensive programming is not something we do in Envoy repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes in general we don't put in this type of defensive logic which we know we can't hit currently, so I would remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so that we are on the same page. There is one place where we call percentage
method with empty headers (nullptr
) in Mongo related code but I don't think that header faults are supported there so we should be fine. The code I'm talking about can be found here https://github.com/envoyproxy/envoy/pull/10724/files#diff-eb7cf1a112138011ff766f2a9dc402f8R405
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
…tp-headers Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Thanks, can you check CI? You can build the docs locally with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you!
Description: Add an option to control the percentage of requests we apply fault to using HTTP headers. See #10648 for more details.
Risk Level: Low, new functionality.
Testing: Added unit tests
Docs Changes: Updated
Release Notes: Updated