Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update routes within router #195

Merged
merged 1 commit into from
Apr 1, 2021
Merged

Conversation

karlbaker02
Copy link
Contributor

@karlbaker02 karlbaker02 commented Mar 22, 2021

This PR adds functionality to Router so that it takes charge of updating its own routes, polling MongoDB every 30 seconds to check for changes and performing the swapping of route tables if updates are found to have occurred. This change takes place at the same time as we stop requests being sent by Router API to force updates to Router instances, allowing us to decouple these apps and make it far easier to run in ECS (as Router API no longer needs to know where to find Router instances).

For more information around these changes, view RFC 135.

@karlbaker02 karlbaker02 force-pushed the update-routes-within-router branch 8 times, most recently from ddf2256 to 593731d Compare March 24, 2021 14:26
@karlbaker02 karlbaker02 marked this pull request as ready for review March 24, 2021 14:27
@karlbaker02
Copy link
Contributor Author

Not sure how much logging we want to keep from what I've added: all the useful stuff that I used to debug I've made logDebug and everything else kept as logInfo to have what's hopefully the minimally-viable amount of logs to know when routes are / are not updated. Happy to change this as people see fit.

Copy link
Contributor

@sengi sengi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't fully reviewed but spotted a couple of things I thought worth mentioning - hopefully useful.

@sengi
Copy link
Contributor

sengi commented Mar 24, 2021

Not sure how much logging we want to keep from what I've added: all the useful stuff that I used to debug I've made logDebug and everything else kept as logInfo to have what's hopefully the minimally-viable amount of logs to know when routes are / are not updated. Happy to change this as people see fit.

I think the logging is fine for now, but (just as an optional suggestion for later) you could get better value out of the log lines that you already have by adding specifics to them. For example I would consider either including the timestamps in the updates found / no updates found messages (so that someone reading the log can tell why it decided it found / didn't find an update) or removing them. I agree that Info feels like the right level for those messages, FWIW.

On a related note, it'd be great to add a Prometheus metric for the last-updated timestamp. Not for this PR though - just mentioning as a possibility for later.

@karlbaker02
Copy link
Contributor Author

karlbaker02 commented Mar 25, 2021

Not sure how much logging we want to keep from what I've added: all the useful stuff that I used to debug I've made logDebug and everything else kept as logInfo to have what's hopefully the minimally-viable amount of logs to know when routes are / are not updated. Happy to change this as people see fit.

I think the logging is fine for now, but (just as an optional suggestion for later) you could get better value out of the log lines that you already have by adding specifics to them. For example I would consider either including the timestamps in the updates found / no updates found messages (so that someone reading the log can tell why it decided it found / didn't find an update) or removing them. I agree that Info feels like the right level for those messages, FWIW.

On a related note, it'd be great to add a Prometheus metric for the last-updated timestamp. Not for this PR though - just mentioning as a possibility for later.

I did think about the specifics, actually timestamp of when the event happened, but then I figured that we already have this information recorded (it must be by the logging functions) in Kibana, as well as timestamps provided by Kibana itself (or at least visible therein). Saying that, I'd be happy to add some specifics in.

The Prometheus metric sounds like a good thing.

@karlbaker02 karlbaker02 requested a review from sengi March 25, 2021 14:34
@karlbaker02 karlbaker02 force-pushed the update-routes-within-router branch from 5047250 to d06cebd Compare March 26, 2021 15:52
Copy link
Contributor

@sengi sengi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a bit of crisis of conscience with this review if I'm honest. I'm used to a much higher bar for well-structured code and testing, and the lack of tests for this really worry me. On the other hand, the existing tests are in a poor state to begin with and this is an essentially unowned code base with a lot of pre-existing issues, and I sense that you're feeling pressure to get this out the door.

I'll therefore defer to your judgement as to where to set the bar. As an experienced GOV.UK developer you no doubt have a better sense of where that bar is than I do. So if you're happy to roll this out, then I don't want to block you. I appreciate you've done some manual testing and let the branch soak in Integration for a bit.

(In case you're wondering what I would do if it were my change: I'd probably split up and simplify pollAndReload to make it a more obvious "if shouldReload() then reload" loop, rip out those dubious pre-existing/copied recovers and the anonymous functions that are only there because of them, and add some tests for all the stuff I'd touched - which would then be easier/possible to write once the nontrivial stuff is no longer buried in anon functions. Nothing stopping us doing that later, of course.)

@karlbaker02
Copy link
Contributor Author

I'm having a bit of crisis of conscience with this review if I'm honest. I'm used to a much higher bar for well-structured code and testing, and the lack of tests for this really worry me. On the other hand, the existing tests are in a poor state to begin with and this is an essentially unowned code base with a lot of pre-existing issues, and I sense that you're feeling pressure to get this out the door.

I'll therefore defer to your judgement as to where to set the bar. As an experienced GOV.UK developer you no doubt have a better sense of where that bar is than I do. So if you're happy to roll this out, then I don't want to block you. I appreciate you've done some manual testing and let the branch soak in Integration for a bit.

(In case you're wondering what I would do if it were my change: I'd probably split up and simplify pollAndReload to make it a more obvious "if shouldReload() then reload" loop, rip out those dubious pre-existing/copied recovers and the anonymous functions that are only there because of them, and add some tests for all the stuff I'd touched - which would then be easier/possible to write once the nontrivial stuff is no longer buried in anon functions. Nothing stopping us doing that later, of course.)

Thanks for this Chris. I felt like we might have wanted to get this out of the door as it has been around for a little while now, but I hear what you say and I think you're right that it could benefit from some tests. I was initially happy to use the existing tests as they cover the behaviour of reloading routes and they caught quite a few issues during development, which gave me confidence that they had good coverage. Still, it probably is worth spending a bit more time adding some additional tests (there are currently no unit tests around Router for example, though as documented in the README, this was a design choice at the time Router was written). I'll get working on this and push up another commit once done - it might not cover everything you've mentioned but I think it should at least make a good start.

@karlbaker02 karlbaker02 force-pushed the update-routes-within-router branch 3 times, most recently from c17d3b4 to 678ef63 Compare April 1, 2021 09:25
This commit adds functionality to Router so that it takes charge of updating its own routes, polling MongoDB every few seconds (determined by the new environment variable `ROUTER_MONGO_POLL_INTERVAL` - defaults to `2s`) to check for changes and performing the swapping of route tables if updates are found to have occurred. This change will enable us to stop requests being sent by Router API to force updates to Router instances, allowing us to decouple these apps and make it far easier to run in ECS (as Router API no longer needs to know where to find Router instances).

[For more information around these changes, view RFC 135](https://github.com/alphagov/govuk-rfcs/blob/main/rfc-135-updating-routes-in-router.md).
@karlbaker02 karlbaker02 merged commit 811ed82 into master Apr 1, 2021
@karlbaker02 karlbaker02 deleted the update-routes-within-router branch April 1, 2021 14:36
kevindew added a commit to alphagov/publishing-e2e-tests that referenced this pull request Apr 15, 2021
This test seems to reveal a somewhat nasty behaviour in collections when
a taxon is being redirected, this appears to be caused in the timeframe
when Router and Content Store are out of sync. I imagine this is exasperated
by the recent change to router which polls for updates [1].

My understanding of the situation in this flaky test is as follows:

- A publisher redirects a taxon which updates the Publishing API,
  Content Store and Router API
- A request comes in for the redirected taxon, Router still thinks
  collection is rendering it
- Collections doesn't know what to do when it loads a content item from
  the Content Store that it expects to be a taxon but is instead a
  redirect, so it raises a 500
- Some time later Router polls the Router API database and updates the
  path to be a redirect.

This can be seen somewhat by the log output from a failed test

```
content-store_1                         | 2021-04-15T16:21:13.418660237Z {"method":"GET","path":"/content/removed-taxon-631090f9-bdc4-464f-b7f0-ebc64b59ebb9","format":"json","controller":"content_items","action":"show","status":200,"duration":4.2,"view":0.72,"db":0.0,"ip":"172.18.0.11","route":"content_items#show","request_id":"1a1b7ea5-cc0f-413e-bbaf-1d3d407a4c27","govuk_dependency_resolution_source_content_id":null,"source":"172.18.0.28","tags":["request"],"@timestamp":"2021-04-15T16:21:13.418Z","@Version":"1"}
nginx-proxy_1                           | 2021-04-15T16:21:13.419322763Z nginx.1    | content-store.dev.gov.uk 172.18.0.45 - - [15/Apr/2021:16:21:13 +0000] "GET /content/removed-taxon-631090f9-bdc4-464f-b7f0-ebc64b59ebb9 HTTP/1.1" 200 723 "-" "gds-api-adapters/71.0.0 (collections)"
content-store_1                         | 2021-04-15T16:21:13.419390573Z 172.18.0.45 - - [15/Apr/2021:16:21:13 +0000] "GET /content/removed-taxon-631090f9-bdc4-464f-b7f0-ebc64b59ebb9 HTTP/1.1" 200 - 0.0062
collections_1                           | 2021-04-15T16:21:13.420484859Z {"method":"HEAD","path":"/removed-taxon-631090f9-bdc4-464f-b7f0-ebc64b59ebb9","format":"*/*","controller":"taxons","action":"show","status":500,"duration":8.17,"error":"RuntimeError\nTried to render a taxon page for content item that is not a taxon\n/app/app/models/taxon.rb:26:in `find'\n/app/app/controllers/taxons_controller.rb:16:in `taxon'\n/app/app/controllers/taxons_controller.rb:5:in `show'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_controller/metal/basic_implicit_render.rb:6:in `send_action'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/abstract_controller/base.rb:228:in `process_action'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_controller/metal/rendering.rb:30:in `process_action'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/abstract_controller/callbacks.rb:42:in `block in process_action'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/callbacks.rb:117:in `block in run_callbacks'\n/usr/local/bundle/gems/sentry-raven-3.1.2/lib/raven/integrations/rails/controller_transaction.rb:7:in `block in included'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/callbacks.rb:126:in `instance_exec'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/callbacks.rb:126:in `block in run_callbacks'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/callbacks.rb:137:in `run_callbacks'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/abstract_controller/callbacks.rb:41:in `process_action'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_controller/metal/rescue.rb:22:in `process_action'\n/usr/local/bundle/gems/logstasher-2.1.5/lib/logstasher/rails_ext/action_controller/metal/instrumentation.rb:40:in `block in process_action'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/notifications.rb:203:in `block in instrument'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/notifications/instrumenter.rb:24:in `instrument'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/notifications.rb:203:in `instrument'\n/usr/local/bundle/gems/logstasher-2.1.5/lib/logstasher/rails_ext/action_controller/metal/instrumentation.rb:27:in `process_action'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_controller/metal/params_wrapper.rb:249:in `process_action'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/abstract_controller/base.rb:165:in `process'\n/usr/local/bundle/gems/actionview-6.1.3.1/lib/action_view/rendering.rb:39:in `process'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_controller/metal.rb:190:in `dispatch'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_controller/metal.rb:254:in `dispatch'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/routing/route_set.rb:50:in `dispatch'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/routing/route_set.rb:33:in `serve'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/journey/router.rb:50:in `block in serve'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/journey/router.rb:32:in `each'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/journey/router.rb:32:in `serve'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/routing/route_set.rb:842:in `call'\n/usr/local/bundle/gems/slimmer-15.5.0/lib/slimmer/app.rb:33:in `call'\n/usr/local/bundle/gems/gds-api-adapters-71.0.0/lib/gds_api/middleware/govuk_header_sniffer.rb:12:in `call'\n/usr/local/bundle/gems/gds-api-adapters-71.0.0/lib/gds_api/middleware/govuk_header_sniffer.rb:12:in `call'\n/usr/local/bundle/gems/gds-api-adapters-71.0.0/lib/gds_api/middleware/govuk_header_sniffer.rb:12:in `call'\n/usr/local/bundle/gems/gds-api-adapters-71.0.0/lib/gds_api/middleware/govuk_header_sniffer.rb:12:in `call'\n/usr/local/bundle/gems/gds-api-adapters-71.0.0/lib/gds_api/middleware/govuk_header_sniffer.rb:12:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/tempfile_reaper.rb:15:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/etag.rb:27:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/conditional_get.rb:27:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/head.rb:12:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/http/permissions_policy.rb:22:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/http/content_security_policy.rb:18:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/cookies.rb:689:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/callbacks.rb:27:in `block in call'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/callbacks.rb:98:in `run_callbacks'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/callbacks.rb:26:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/actionable_exceptions.rb:18:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/debug_exceptions.rb:29:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/show_exceptions.rb:33:in `call'\n/usr/local/bundle/gems/logstasher-2.1.5/lib/logstasher/rails_ext/rack/logger.rb:16:in `call_app'\n/usr/local/bundle/gems/railties-6.1.3.1/lib/rails/rack/logger.rb:26:in `block in call'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/tagged_logging.rb:99:in `block in tagged'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/tagged_logging.rb:37:in `tagged'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/tagged_logging.rb:99:in `tagged'\n/usr/local/bundle/gems/railties-6.1.3.1/lib/rails/rack/logger.rb:26:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/remote_ip.rb:81:in `call'\n/usr/local/bundle/gems/request_store-1.5.0/lib/request_store/middleware.rb:19:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/request_id.rb:26:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/method_override.rb:24:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/runtime.rb:22:in `call'\n/usr/local/bundle/gems/activesupport-6.1.3.1/lib/active_support/cache/strategy/local_cache_middleware.rb:29:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/executor.rb:14:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/static.rb:24:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/sendfile.rb:110:in `call'\n/usr/local/bundle/gems/actionpack-6.1.3.1/lib/action_dispatch/middleware/host_authorization.rb:92:in `call'\n/usr/local/bundle/gems/sentry-raven-3.1.2/lib/raven/integrations/rack.rb:51:in `call'\n/usr/local/bundle/gems/railties-6.1.3.1/lib/rails/engine.rb:539:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/tempfile_reaper.rb:15:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/lint.rb:50:in `_call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/lint.rb:38:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/show_exceptions.rb:23:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/common_logger.rb:38:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/chunked.rb:98:in `call'\n/usr/local/bundle/gems/rack-2.2.3/lib/rack/content_length.rb:17:in `call'\n/usr/local/bundle/gems/unicorn-5.8.0/lib/unicorn/http_server.rb:634:in `process_client'\n/usr/local/bundle/gems/unicorn-5.8.0/lib/unicorn/http_server.rb:732:in `worker_loop'\n/usr/local/bundle/gems/unicorn-5.8.0/lib/unicorn/http_server.rb:548:in `spawn_missing_workers'\n/usr/local/bundle/gems/unicorn-5.8.0/lib/unicorn/http_server.rb:144:in `start'\n/usr/local/bundle/gems/unicorn-5.8.0/bin/unicorn:128:in `\u003ctop (required)\u003e'\n/usr/local/bundle/bin/unicorn:23:in `load'\n/usr/local/bundle/bin/unicorn:23:in `\u003ctop (required)\u003e'\n/usr/local/lib/ruby/2.7.0/bundler/cli/exec.rb:63:in `load'\n/usr/local/lib/ruby/2.7.0/bundler/cli/exec.rb:63:in `kernel_load'\n/usr/local/lib/ruby/2.7.0/bundler/cli/exec.rb:28:in `run'\n/usr/local/lib/ruby/2.7.0/bundler/cli.rb:476:in `exec'\n/usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor/command.rb:27:in `run'\n/usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'\n/usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor.rb:399:in `dispatch'\n/usr/local/lib/ruby/2.7.0/bundler/cli.rb:30:in `dispatch'\n/usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor/base.rb:476:in `start'\n/usr/local/lib/ruby/2.7.0/bundler/cli.rb:24:in `start'\n/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/libexec/bundle:46:in `block in \u003ctop (required)\u003e'\n/usr/local/lib/ruby/2.7.0/bundler/friendly_errors.rb:123:in `with_friendly_errors'\n/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/libexec/bundle:34:in `\u003ctop (required)\u003e'\n/usr/local/bin/bundle:23:in `load'\n/usr/local/bin/bundle:23:in `\u003cmain\u003e'","ip":"172.18.0.56","route":"taxons#show","request_id":"3bc5124d-a54a-4f4d-bcb8-2654263d1893","request":"HEAD /removed-taxon-631090f9-bdc4-464f-b7f0-ebc64b59ebb9 HTTP/1.1","govuk_request_id":"1ef2e0465504cf9e92cd9a3f70c8668d","varnish_id":null,"govuk_app_config":"2.10.0","source":"172.18.0.45","tags":["request","exception"],"@timestamp":"2021-04-15T16:21:13.419Z","@Version":"1"}
nginx-proxy_1                           | 2021-04-15T16:21:13.421570925Z nginx.1    | collections.dev.gov.uk 172.18.0.21 - - [15/Apr/2021:16:21:13 +0000] "HEAD /removed-taxon-631090f9-bdc4-464f-b7f0-ebc64b59ebb9 HTTP/1.1" 500 0 "-" "Ruby"
nginx-proxy_1                           | 2021-04-15T16:21:13.421816946Z nginx.1    | www.dev.gov.uk 172.18.0.56 - - [15/Apr/2021:16:21:13 +0000] "HEAD /removed-taxon-631090f9-bdc4-464f-b7f0-ebc64b59ebb9 HTTP/1.1" 500 0 "-" "Ruby"
router_1                                | 2021-04-15T16:21:14.312704993Z 2021/04/15 16:21:14 router: registered /removed-taxon-631090f9-bdc4-464f-b7f0-ebc64b59ebb9 (prefix: false) -> /redirection-taxon-5f7df643-880c-4af1-94c5-e386399a996c
```

Ideally Collections should be able to handle a content item that is a
redirect (there is precedence in Government Frontend [2]) - as there is
never a guarantee Router and Content Store are synchronised. However
fixing that is another, bigger, job so I'm doing what will fix this test
now.

[1]: alphagov/router#195
[2]: https://github.com/alphagov/government-frontend/blob/1c060844f48f1b6b5be9833b0b9fc875d8207429/app/controllers/content_items_controller.rb#L8
kevindew added a commit to alphagov/publishing-e2e-tests that referenced this pull request Apr 16, 2021
This test will wait for a recently published piece of content to change
from a 200 response to a 301 redirect. However an assumption of this
test is that the content has made it into router and is returning a 200
response. This assumption, while presumptuous, used to hold until the
recent changes to Router which switched to a polling rather than a
notifying mechanism [1].

To resolve this I've set this wait condition to be able to handle the
state where router hasn't updated yet.

[1]: alphagov/router#195
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants