Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose fault.http.abort.http_status setting via HTTP header #10294

Merged
merged 43 commits into from
Mar 11, 2020

Conversation

Augustyniak
Copy link
Contributor

@Augustyniak Augustyniak commented Mar 6, 2020

Description: The partial implementation of #10254. Adding a support for http header responsible for injecting faults - aborting requests with x-envoy-fault-abort-request HTTP header set.
Risk Level: low, new feature.
Testing: Added
Docs Changes: Added
Release Notes: Added

Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #10294 was opened by Augustyniak.

see: more, trace.

Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
@Augustyniak
Copy link
Contributor Author

@mattklein123 @htuch I think that I've addressed all of your comments.

I have one question regarding tests. I've tried to add an integration unit tests to fault_filter_integration_test.cc file but every time I tried to inject an abort fault via HTTP header I ended up getting the following error while running tests:

[2020-03-10 01:25:56.208][12906338][critical][assert] [test/integration/http_integration.cc:384] assert failure: result. Details: Timed out waiting for new connection.

It fails on waitForNextUpstreamRequest(); which makes sense since from what I know there is no upstream request if we inject abort fault but I just want to confirm with you that it's fine.

A git patch that can be used to test this:

diff --git a/test/extensions/filters/http/fault/fault_filter_integration_test.cc b/test/extensions/filters/http/fault/fault_filter_integration_test.cc
index 70cfb89e5..68b55bb23 100644
--- a/test/extensions/filters/http/fault/fault_filter_integration_test.cc
+++ b/test/extensions/filters/http/fault/fault_filter_integration_test.cc
@@ -107,6 +107,7 @@ TEST_P(FaultIntegrationTestAllProtocols, HeaderFaultConfig) {
                                                  {":scheme", "http"},
                                                  {":authority", "host"},
                                                  {"x-envoy-fault-delay-request", "200"},
+                                                 {"x-envoy-fault-abort-request", "503"},
                                                  {"x-envoy-fault-throughput-response", "1"}};
   const auto current_time = simTime().monotonicTime();
   IntegrationStreamDecoderPtr decoder = codec_client_->makeHeaderOnlyRequest(request_headers);

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks great with one small nit and test request. Nice work! In terms of adding an integration test, here is an example of a test that doesn't use an upstream, it just makes a request and expects a response:

TEST_P(IntegrationTest, ConnectionClose) {

/wait

return ret;
}

if (code >= 200 and code < 600) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/and/&& (some C++ compilers don't support it and it's not widely used), also please add some out of range tests.

Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
@Augustyniak
Copy link
Contributor Author

Thanks, looks great with one small nit and test request. Nice work! In terms of adding an integration test, here is an example of a test that doesn't use an upstream, it just makes a request and expects a response:

Thank you! I tried something like this but it just got stuck on response->waitForEndStream():

// Request abort controlled via header configuration.
TEST_P(FaultIntegrationTestAllProtocols, HeaderFaultAbortConfig) {
  initializeFilter(header_fault_config_);
  codec_client_ = makeHttpConnection(lookupPort("http"));

  auto response =
      codec_client_->makeHeaderOnlyRequest(Http::TestRequestHeaderMapImpl{{":method", "GET"},
                                                                          {":path", "/test/long/url"},
                                                                          {":scheme", "http"},
                                                                          {":authority", "host"},
                                                                          {"connection", "close"},
                                                                          {"x-envoy-fault-abort-request", "429"}});
  response->waitForEndStream();
  codec_client_->waitForDisconnect();
  
  EXPECT_TRUE(response->complete());
  EXPECT_THAT(response->headers(), Envoy::Http::HttpStatusIs("429"));
}

Removal of {"x-envoy-fault-abort-request", "429"} header didn't help. Sorry for a noob question but am I missing anything in my test?

Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
@mattklein123
Copy link
Member

I'm not sure why your test is failing, but it's probably an actual issue. I would recommend running your test with debug tracing locally to debug a bit. Try the following bazel flags:

--test_arg="-l trace" --test_arg="--gtest_filter=<your test>"

Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
@Augustyniak
Copy link
Contributor Author

Augustyniak commented Mar 10, 2020

  • I was able to add an integration test for a case when abort header fault is injected and it's green. deec559
  • I still have problem when I try to write a test which that tests header delay and abort faults. I've tried to play with time using simTime().sleep(std::chrono::milliseconds(400)); or simTime().setMonotonicTime(simTime().monotonicTime() + std::chrono::milliseconds(1000)); in my test but it didn't help. More explanation below:

HTTP headers used in integration test:

{{":method", "GET"},
{":path", "/test/long/url"},
{":scheme", "http"},
{":authority", "host"},
{"x-envoy-fault-delay-request", "200"},
{"x-envoy-fault-abort-request", "429"}}

I was able to debug the test using the following command:

bazel test //test/extensions/filters/http/fault:fault_filter_integration_test --test_arg="-l trace" --test_arg="--gtest_filter=Protocols/FaultIntegrationTestAllProtocols.HeaderFaultAbortAndDelayConfig/IPv4_HttpDownstream_HttpUpstream" --test_output=streamed

Debugging output:

2020-03-10 21:21:38.168][1205163][info][main] [source/server/server.cc:551] starting main dispatch loop
[2020-03-10 21:21:38.172][1205084][debug][testing] [test/integration/integration.cc:416] registered 'http' as port 56411.
[2020-03-10 21:21:38.174][1205084][debug][upstream] [source/common/upstream/upstream_impl.cc:274] transport socket match, socket test selected for host with address 127.0.0.1:80
[2020-03-10 21:21:38.175][1205084][debug][client] [source/common/http/codec_client.cc:34] [C0] connecting
[2020-03-10 21:21:38.175][1205084][debug][connection] [source/common/network/connection_impl.cc:725] [C0] connecting to 127.0.0.1:56411
[2020-03-10 21:21:38.175][1205084][debug][connection] [source/common/network/connection_impl.cc:734] [C0] connection in progress
[2020-03-10 21:21:38.175][1205084][trace][connection] [source/common/network/connection_impl.cc:492] [C0] socket event: 2
[2020-03-10 21:21:38.175][1205084][trace][connection] [source/common/network/connection_impl.cc:580] [C0] write ready
[2020-03-10 21:21:38.175][1205084][debug][connection] [source/common/network/connection_impl.cc:591] [C0] connected
[2020-03-10 21:21:38.175][1205084][debug][client] [source/common/http/codec_client.cc:72] [C0] connected
[2020-03-10 21:21:38.175][1205084][trace][connection] [source/common/network/connection_impl.cc:428] [C0] writing 130 bytes, end_stream false
[2020-03-10 21:21:38.175][1205201][debug][conn_handler] [source/server/connection_handler_impl.cc:353] [C1] new connection
[2020-03-10 21:21:38.175][1205084][trace][connection] [source/common/network/connection_impl.cc:492] [C0] socket event: 2
[2020-03-10 21:21:38.175][1205084][trace][connection] [source/common/network/connection_impl.cc:580] [C0] write ready
[2020-03-10 21:21:38.175][1205201][trace][connection] [source/common/network/connection_impl.cc:492] [C1] socket event: 2
[2020-03-10 21:21:38.175][1205201][trace][connection] [source/common/network/connection_impl.cc:580] [C1] write ready
[2020-03-10 21:21:38.175][1205084][trace][connection] [source/common/network/raw_buffer_socket.cc:68] [C0] write returns: 130
[2020-03-10 21:21:38.175][1205084][trace][connection] [source/common/network/connection_impl.cc:492] [C0] socket event: 2
[2020-03-10 21:21:38.175][1205201][trace][connection] [source/common/network/connection_impl.cc:492] [C1] socket event: 1
[2020-03-10 21:21:38.175][1205084][trace][connection] [source/common/network/connection_impl.cc:580] [C0] write ready
[2020-03-10 21:21:38.175][1205201][trace][connection] [source/common/network/connection_impl.cc:530] [C1] read ready
[2020-03-10 21:21:38.175][1205201][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C1] read returns: 130
[2020-03-10 21:21:38.175][1205201][trace][connection] [source/common/network/raw_buffer_socket.cc:39] [C1] read error: Resource temporarily unavailable
[2020-03-10 21:21:38.175][1205201][trace][http] [source/common/http/http1/codec_impl.cc:467] [C1] parsing 130 bytes
[2020-03-10 21:21:38.175][1205201][trace][http] [source/common/http/http1/codec_impl.cc:649] [C1] message begin
[2020-03-10 21:21:38.175][1205201][debug][http] [source/common/http/conn_manager_impl.cc:264] [C1] new stream
[2020-03-10 21:21:38.176][1205201][trace][http] [source/common/http/http1/codec_impl.cc:424] [C1] completed header: key=host value=host
[2020-03-10 21:21:38.176][1205201][trace][http] [source/common/http/http1/codec_impl.cc:424] [C1] completed header: key=x-envoy-fault-delay-request value=200
[2020-03-10 21:21:38.176][1205201][trace][http] [source/common/http/http1/codec_impl.cc:424] [C1] completed header: key=x-envoy-fault-abort-request value=429
[2020-03-10 21:21:38.176][1205201][trace][http] [source/common/http/http1/codec_impl.cc:570] [C1] onHeadersCompleteBase
[2020-03-10 21:21:38.176][1205201][trace][http] [source/common/http/http1/codec_impl.cc:424] [C1] completed header: key=content-length value=0
[2020-03-10 21:21:38.176][1205201][trace][http] [source/common/http/http1/codec_impl.cc:746] [C1] Server: onHeadersComplete size=4
[2020-03-10 21:21:38.176][1205201][trace][http] [source/common/http/http1/codec_impl.cc:628] [C1] message complete
[2020-03-10 21:21:38.176][1205201][debug][http] [source/common/http/conn_manager_impl.cc:764] [C1][S11532646105675666467] request headers complete (end_stream=true):
':authority', 'host'
':path', '/test/long/url'
':method', 'GET'
'x-envoy-fault-delay-request', '200'
'x-envoy-fault-abort-request', '429'
'content-length', '0'

[2020-03-10 21:21:38.176][1205201][debug][http] [source/common/http/conn_manager_impl.cc:1316] [C1][S11532646105675666467] request end stream
[2020-03-10 21:21:38.176][1205201][debug][filter] [source/extensions/filters/http/fault/fault_filter.cc:160] fault: delaying request 200ms
[2020-03-10 21:21:38.176][1205201][trace][http] [source/common/http/conn_manager_impl.cc:1025] [C1][S11532646105675666467] decode headers called: filter=0xd0
a15c0 status=1
[2020-03-10 21:21:38.176][1205201][trace][http] [source/common/http/http1/codec_impl.cc:488] [C1] parsed 130 bytes
[2020-03-10 21:21:38.176][1205201][trace][connection] [source/common/network/connection_impl.cc:314] [C1] readDisable: enabled=true disable=true state=0
[2020-03-10 21:21:38.176][1205201][trace][connection] [source/common/network/connection_impl.cc:492] [C1] socket event: 2
[2020-03-10 21:21:38.176][1205201][trace][connection] [source/common/network/connection_impl.cc:580] [C1] write ready

I confirmed that postDelayInjection method is never called here (and it should be called)

decoder_callbacks_->dispatcher().createTimer([this]() -> void { postDelayInjection(); });

@mattklein123
Copy link
Member

Thanks for trying the delay integration test. I think we can probably give up on it now, but before we do that, can you show a diff of your proposed test? Where are you putting the sim time advance? I think the issue is that you are likely advancing time before the actual timer is created, so it's racing, and you have no way of actually knowing when it's safe to advance time. If we keep a gauge of active faults that are currently delayed (?) you could probably wait on this gauge being 1 and that would work, but I'm not sure if we do that.

cc @jmarantz.

reserved 1;

oneof error_type {
option (validate.required) = true;

// HTTP status code to use to abort the HTTP request.
uint32 http_status = 2 [(validate.rules).uint32 = {lt: 600 gte: 200}];

// Fault aborts are controlled via an HTTP header (if applicable).
HeaderAbort header_abort = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this header_error. The abort is still controlled by the normal mechanisms, it's just the nature of the error is expressed in the header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it but I think that header_abort is a better name because:

  1. header_error can be in theory used to inject status codes such as 200 or 204 that are not errors.
  2. header_abort follows a naming scheme used by header_limit and header_delay fields. header_error would've been a better choice if header_limit was header_kbs and header_delay was header_duration.

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would be inclined to stick with header_abort for the reasons that @Augustyniak outlines. @htuch WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

@@ -36,6 +36,9 @@ The fault filter has the capability to allow fault configuration to be specified
This is useful in certain scenarios in which it is desired to allow the client to specify its own
fault configuration. The currently supported header controls are:

* Request abort configuration via the *x-envoy-fault-abort-request* header. The header value
should be an integer that specifies the HTTP status code to return in response to a request
and must be in the range [200, 600).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth pointing out that header_abort needs to be set (and including an RST link back to it).

@Augustyniak
Copy link
Contributor Author

Augustyniak commented Mar 10, 2020

Diff for the test I have problems with:

diff --git a/test/extensions/filters/http/fault/fault_filter_integration_test.cc b/test/extensions/filters/http/fault/fault_filter_integration_test.cc
index 44da53d1e..4040219e4 100644
--- a/test/extensions/filters/http/fault/fault_filter_integration_test.cc
+++ b/test/extensions/filters/http/fault/fault_filter_integration_test.cc
@@ -154,6 +154,31 @@ TEST_P(FaultIntegrationTestAllProtocols, HeaderFaultAbortConfig) {
   EXPECT_EQ(0UL, test_server_->counter("http.config_test.fault.response_rl_injected")->value());
 }
 
+// Request abort controlled via header configuration.
+TEST_P(FaultIntegrationTestAllProtocols, HeaderFaultAbortAndDelayConfig) {
+  initializeFilter(header_fault_config_);
+  codec_client_ = makeHttpConnection(makeClientConnection(lookupPort("http")));
+
+  auto response = codec_client_->makeHeaderOnlyRequest(
+      Http::TestRequestHeaderMapImpl{{":method", "GET"},
+                                     {":path", "/test/long/url"},
+                                     {":scheme", "http"},
+                                     {":authority", "host"},
+                                     {"x-envoy-fault-delay-request", "200"},
+                                     {"x-envoy-fault-abort-request", "429"}});
+
+  simTime().setMonotonicTime(simTime().monotonicTime() + std::chrono::milliseconds(1000));
+
+  response->waitForEndStream();
+
+  EXPECT_TRUE(response->complete());
+  EXPECT_THAT(response->headers(), Envoy::Http::HttpStatusIs("429"));
+
+  EXPECT_EQ(1UL, test_server_->counter("http.config_test.fault.aborts_injected")->value());
+  EXPECT_EQ(0UL, test_server_->counter("http.config_test.fault.delays_injected")->value());
+  EXPECT_EQ(0UL, test_server_->counter("http.config_test.fault.response_rl_injected")->value());
+}
+
 // Header configuration with no headers, so no fault injection.
 TEST_P(FaultIntegrationTestAllProtocols, HeaderFaultConfigNoHeaders) {
   initializeFilter(header_fault_config_);

You were right, my time modification logic is racing with the logic responsible for delay injection and I end up modifying time before I schedule a timer.

Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Signed-off-by: Rafal Augustyniak <raugustyniak@lyft.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending @htuch approval. Thank you!

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm api

@htuch htuch merged commit 55971b2 into envoyproxy:master Mar 11, 2020
@Augustyniak Augustyniak changed the title Expose fault.http.abort.http_status setting via HTTTP header Expose fault.http.abort.http_status setting via HTTP header Jul 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants