-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update routes within router #195
Conversation
ddf2256
to
593731d
Compare
Not sure how much logging we want to keep from what I've added: all the useful stuff that I used to debug I've made |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't fully reviewed but spotted a couple of things I thought worth mentioning - hopefully useful.
I think the logging is fine for now, but (just as an optional suggestion for later) you could get better value out of the log lines that you already have by adding specifics to them. For example I would consider either including the timestamps in the On a related note, it'd be great to add a Prometheus metric for the last-updated timestamp. Not for this PR though - just mentioning as a possibility for later. |
I did think about the specifics, actually timestamp of when the event happened, but then I figured that we already have this information recorded (it must be by the logging functions) in Kibana, as well as timestamps provided by Kibana itself (or at least visible therein). Saying that, I'd be happy to add some specifics in. The Prometheus metric sounds like a good thing. |
5047250
to
d06cebd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a bit of crisis of conscience with this review if I'm honest. I'm used to a much higher bar for well-structured code and testing, and the lack of tests for this really worry me. On the other hand, the existing tests are in a poor state to begin with and this is an essentially unowned code base with a lot of pre-existing issues, and I sense that you're feeling pressure to get this out the door.
I'll therefore defer to your judgement as to where to set the bar. As an experienced GOV.UK developer you no doubt have a better sense of where that bar is than I do. So if you're happy to roll this out, then I don't want to block you. I appreciate you've done some manual testing and let the branch soak in Integration for a bit.
(In case you're wondering what I would do if it were my change: I'd probably split up and simplify pollAndReload to make it a more obvious "if shouldReload() then reload" loop, rip out those dubious pre-existing/copied recover
s and the anonymous functions that are only there because of them, and add some tests for all the stuff I'd touched - which would then be easier/possible to write once the nontrivial stuff is no longer buried in anon functions. Nothing stopping us doing that later, of course.)
Thanks for this Chris. I felt like we might have wanted to get this out of the door as it has been around for a little while now, but I hear what you say and I think you're right that it could benefit from some tests. I was initially happy to use the existing tests as they cover the behaviour of reloading routes and they caught quite a few issues during development, which gave me confidence that they had good coverage. Still, it probably is worth spending a bit more time adding some additional tests (there are currently no unit tests around |
c17d3b4
to
678ef63
Compare
This commit adds functionality to Router so that it takes charge of updating its own routes, polling MongoDB every few seconds (determined by the new environment variable `ROUTER_MONGO_POLL_INTERVAL` - defaults to `2s`) to check for changes and performing the swapping of route tables if updates are found to have occurred. This change will enable us to stop requests being sent by Router API to force updates to Router instances, allowing us to decouple these apps and make it far easier to run in ECS (as Router API no longer needs to know where to find Router instances). [For more information around these changes, view RFC 135](https://github.com/alphagov/govuk-rfcs/blob/main/rfc-135-updating-routes-in-router.md).
This test seems to reveal a somewhat nasty behaviour in collections when a taxon is being redirected, this appears to be caused in the timeframe when Router and Content Store are out of sync. I imagine this is exasperated by the recent change to router which polls for updates [1]. My understanding of the situation in this flaky test is as follows: - A publisher redirects a taxon which updates the Publishing API, Content Store and Router API - A request comes in for the redirected taxon, Router still thinks collection is rendering it - Collections doesn't know what to do when it loads a content item from the Content Store that it expects to be a taxon but is instead a redirect, so it raises a 500 - Some time later Router polls the Router API database and updates the path to be a redirect. This can be seen somewhat by the log output from a failed test ``` content-store_1 | 2021-04-15T16:21:13.418660237Z {"method":"GET","path":"/content/removed-taxon-631090f9-bdc4-464f-b7f0-ebc64b59ebb9","format":"json","controller":"content_items","action":"show","status":200,"duration":4.2,"view":0.72,"db":0.0,"ip":"172.18.0.11","route":"content_items#show","request_id":"1a1b7ea5-cc0f-413e-bbaf-1d3d407a4c27","govuk_dependency_resolution_source_content_id":null,"source":"172.18.0.28","tags":["request"],"@timestamp":"2021-04-15T16:21:13.418Z","@Version":"1"} nginx-proxy_1 | 2021-04-15T16:21:13.419322763Z nginx.1 | content-store.dev.gov.uk 172.18.0.45 - - [15/Apr/2021:16:21:13 +0000] "GET /content/removed-taxon-631090f9-bdc4-464f-b7f0-ebc64b59ebb9 HTTP/1.1" 200 723 "-" "gds-api-adapters/71.0.0 (collections)" content-store_1 | 2021-04-15T16:21:13.419390573Z 172.18.0.45 - - [15/Apr/2021:16:21:13 +0000] "GET /content/removed-taxon-631090f9-bdc4-464f-b7f0-ebc64b59ebb9 HTTP/1.1" 200 - 0.0062 collections_1 | 2021-04-15T16:21:13.420484859Z {"method":"HEAD","path":"/removed-taxon-631090f9-bdc4-464f-b7f0-ebc64b59ebb9","format":"*/*","controller":"taxons","action":"show","status":500,"duration":8.17,"error":"RuntimeError\nTried to render a taxon page for content item that is not a taxon\n/app/app/models/taxon.rb:26:in `find'\n/app/app/controllers/taxons_controller.rb:16:in `taxon'\n/app/app/controllers/taxons_controller.rb:5:in `show'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_controller/metal/basic_implicit_render.rb:6:in `send_action'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/abstract_controller/base.rb:228:in `process_action'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_controller/metal/rendering.rb:30:in `process_action'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/abstract_controller/callbacks.rb:42:in `block in process_action'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/callbacks.rb:117:in `block in run_callbacks'\n/usr/local/bundle/gems/sentry-raven-3.1.2/lib/raven/integrations/rails/controller_transaction.rb:7:in `block in included'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/callbacks.rb:126:in `instance_exec'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/callbacks.rb:126:in `block in run_callbacks'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/callbacks.rb:137:in `run_callbacks'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/abstract_controller/callbacks.rb:41:in `process_action'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_controller/metal/rescue.rb:22:in `process_action'\n/usr/local/bundle/gems/logstasher-2.1.5/lib/logstasher/rails_ext/action_controller/metal/instrumentation.rb:40:in `block in process_action'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/notifications.rb:203:in `block in instrument'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/notifications/instrumenter.rb:24:in `instrument'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/notifications.rb:203:in `instrument'\n/usr/local/bundle/gems/logstasher-2.1.5/lib/logstasher/rails_ext/action_controller/metal/instrumentation.rb:27:in `process_action'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_controller/metal/params_wrapper.rb:249:in `process_action'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/abstract_controller/base.rb:165:in `process'\n/usr/local/bundle/gems/actionview-6.1.3.1/lib/action_view/rendering.rb:39:in `process'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_controller/metal.rb:190:in `dispatch'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_controller/metal.rb:254:in `dispatch'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/routing/route_set.rb:50:in `dispatch'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/routing/route_set.rb:33:in `serve'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/journey/router.rb:50:in `block in serve'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/journey/router.rb:32:in `each'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/journey/router.rb:32:in `serve'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/routing/route_set.rb:842:in `call'\n/usr/local/bundle/gems/slimmer-15.5.0/lib/slimmer/app.rb:33:in `call'\n/usr/local/bundle/gems/gds-api-adapters-71.0.0/lib/gds_api/middleware/govuk_header_sniffer.rb:12:in `call'\n/usr/local/bundle/gems/gds-api-adapters-71.0.0/lib/gds_api/middleware/govuk_header_sniffer.rb:12:in `call'\n/usr/local/bundle/gems/gds-api-adapters-71.0.0/lib/gds_api/middleware/govuk_header_sniffer.rb:12:in `call'\n/usr/local/bundle/gems/gds-api-adapters-71.0.0/lib/gds_api/middleware/govuk_header_sniffer.rb:12:in `call'\n/usr/local/bundle/gems/gds-api-adapters-71.0.0/lib/gds_api/middleware/govuk_header_sniffer.rb:12:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/tempfile_reaper.rb:15:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/etag.rb:27:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/conditional_get.rb:27:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/head.rb:12:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/http/permissions_policy.rb:22:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/http/content_security_policy.rb:18:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/cookies.rb:689:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/callbacks.rb:27:in `block in call'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/callbacks.rb:98:in `run_callbacks'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/callbacks.rb:26:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/actionable_exceptions.rb:18:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/debug_exceptions.rb:29:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/show_exceptions.rb:33:in `call'\n/usr/local/bundle/gems/logstasher-2.1.5/lib/logstasher/rails_ext/rack/logger.rb:16:in `call_app'\n/usr/local/bundle/gems/railties-6.1.3.1/lib/rails/rack/logger.rb:26:in `block in call'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/tagged_logging.rb:99:in `block in tagged'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/tagged_logging.rb:37:in `tagged'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/tagged_logging.rb:99:in `tagged'\n/usr/local/bundle/gems/railties-6.1.3.1/lib/rails/rack/logger.rb:26:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/remote_ip.rb:81:in `call'\n/usr/local/bundle/gems/request_store-1.5.0/lib/request_store/middleware.rb:19:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/request_id.rb:26:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/method_override.rb:24:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/runtime.rb:22:in `call'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/cache/strategy/local_cache_middleware.rb:29:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/executor.rb:14:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/static.rb:24:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/sendfile.rb:110:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/host_authorization.rb:92:in `call'\n/usr/local/bundle/gems/sentry-raven-3.1.2/lib/raven/integrations/rack.rb:51:in `call'\n/usr/local/bundle/gems/railties-6.1.3.1/lib/rails/engine.rb:539:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/tempfile_reaper.rb:15:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/lint.rb:50:in `_call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/lint.rb:38:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/show_exceptions.rb:23:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/common_logger.rb:38:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/chunked.rb:98:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/content_length.rb:17:in `call'\n/usr/local/bundle/gems/unicorn-5.8.0/lib/unicorn/http_server.rb:634:in `process_client'\n/usr/local/bundle/gems/unicorn-5.8.0/lib/unicorn/http_server.rb:732:in `worker_loop'\n/usr/local/bundle/gems/unicorn-5.8.0/lib/unicorn/http_server.rb:548:in `spawn_missing_workers'\n/usr/local/bundle/gems/unicorn-5.8.0/lib/unicorn/http_server.rb:144:in `start'\n/usr/local/bundle/gems/unicorn-5.8.0/bin/unicorn:128:in `\u003ctop (required)\u003e'\n/usr/local/bundle/bin/unicorn:23:in `load'\n/usr/local/bundle/bin/unicorn:23:in `\u003ctop (required)\u003e'\n/usr/local/lib/ruby/2.7.0/bundler/cli/exec.rb:63:in `load'\n/usr/local/lib/ruby/2.7.0/bundler/cli/exec.rb:63:in `kernel_load'\n/usr/local/lib/ruby/2.7.0/bundler/cli/exec.rb:28:in `run'\n/usr/local/lib/ruby/2.7.0/bundler/cli.rb:476:in `exec'\n/usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor/command.rb:27:in `run'\n/usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'\n/usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor.rb:399:in `dispatch'\n/usr/local/lib/ruby/2.7.0/bundler/cli.rb:30:in `dispatch'\n/usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor/base.rb:476:in `start'\n/usr/local/lib/ruby/2.7.0/bundler/cli.rb:24:in `start'\n/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/libexec/bundle:46:in `block in \u003ctop (required)\u003e'\n/usr/local/lib/ruby/2.7.0/bundler/friendly_errors.rb:123:in `with_friendly_errors'\n/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/libexec/bundle:34:in `\u003ctop (required)\u003e'\n/usr/local/bin/bundle:23:in `load'\n/usr/local/bin/bundle:23:in `\u003cmain\u003e'","ip":"172.18.0.56","route":"taxons#show","request_id":"3bc5124d-a54a-4f4d-bcb8-2654263d1893","request":"HEAD /removed-taxon-631090f9-bdc4-464f-b7f0-ebc64b59ebb9 HTTP/1.1","govuk_request_id":"1ef2e0465504cf9e92cd9a3f70c8668d","varnish_id":null,"govuk_app_config":"2.10.0","source":"172.18.0.45","tags":["request","exception"],"@timestamp":"2021-04-15T16:21:13.419Z","@Version":"1"} nginx-proxy_1 | 2021-04-15T16:21:13.421570925Z nginx.1 | collections.dev.gov.uk 172.18.0.21 - - [15/Apr/2021:16:21:13 +0000] "HEAD /removed-taxon-631090f9-bdc4-464f-b7f0-ebc64b59ebb9 HTTP/1.1" 500 0 "-" "Ruby" nginx-proxy_1 | 2021-04-15T16:21:13.421816946Z nginx.1 | www.dev.gov.uk 172.18.0.56 - - [15/Apr/2021:16:21:13 +0000] "HEAD /removed-taxon-631090f9-bdc4-464f-b7f0-ebc64b59ebb9 HTTP/1.1" 500 0 "-" "Ruby" router_1 | 2021-04-15T16:21:14.312704993Z 2021/04/15 16:21:14 router: registered /removed-taxon-631090f9-bdc4-464f-b7f0-ebc64b59ebb9 (prefix: false) -> /redirection-taxon-5f7df643-880c-4af1-94c5-e386399a996c ``` Ideally Collections should be able to handle a content item that is a redirect (there is precedence in Government Frontend [2]) - as there is never a guarantee Router and Content Store are synchronised. However fixing that is another, bigger, job so I'm doing what will fix this test now. [1]: alphagov/router#195 [2]: https://github.com/alphagov/government-frontend/blob/1c060844f48f1b6b5be9833b0b9fc875d8207429/app/controllers/content_items_controller.rb#L8
This test will wait for a recently published piece of content to change from a 200 response to a 301 redirect. However an assumption of this test is that the content has made it into router and is returning a 200 response. This assumption, while presumptuous, used to hold until the recent changes to Router which switched to a polling rather than a notifying mechanism [1]. To resolve this I've set this wait condition to be able to handle the state where router hasn't updated yet. [1]: alphagov/router#195
This PR adds functionality to Router so that it takes charge of updating its own routes, polling MongoDB every 30 seconds to check for changes and performing the swapping of route tables if updates are found to have occurred. This change takes place at the same time as we stop requests being sent by Router API to force updates to Router instances, allowing us to decouple these apps and make it far easier to run in ECS (as Router API no longer needs to know where to find Router instances).
For more information around these changes, view RFC 135.