Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blockerbot scheduled task does not run successfully in Production #92

Open
1 task
prehnRA opened this issue Jun 6, 2019 · 5 comments
Open
1 task
Assignees

Comments

@prehnRA
Copy link
Contributor

prehnRA commented Jun 6, 2019

Background

We have implemented the Blockerbot command as both a slash command and a scheduled task in Slax. The command works in testing when run manually, but has not been running successfully on the schedule. We need to resolve this.

Here's an excerpt from the logs that may help explain what is happening:

2019-06-06T14:25:00+00:00 slax[cmd.v46.zt79r]: ** (FunctionClauseError) no function clause matching in Access.get/3
2019-06-06T14:25:00+00:00 slax[cmd.v46.zt79r]:     (elixir) lib/access.ex:320: Access.get({"documentation_url", "https://developer.github.com/v3"}, "number", nil)
2019-06-06T14:25:00+00:00 slax[cmd.v46.zt79r]:     (elixir) lib/enum.ex:1331: anonymous fn/3 in Enum.map/2
2019-06-06T14:25:00+00:00 slax[cmd.v46.zt79r]:     (stdlib) maps.erl:257: :maps.fold_1/3
2019-06-06T14:25:00+00:00 slax[cmd.v46.zt79r]:     (elixir) lib/enum.ex:1956: Enum.map/2
2019-06-06T14:25:00+00:00 slax[cmd.v46.zt79r]:     (slax) lib/slax/event_sink.ex:17: Slax.EventSink.fetch_issues_events/2
2019-06-06T14:25:00+00:00 slax[cmd.v46.zt79r]:     (slax) lib/slax/commands/latency.ex:19: Slax.Commands.Latency.text_for_org_and_repo/3
2019-06-06T14:25:00+00:00 slax[cmd.v46.zt79r]:     (slax) lib/slax/scheduler.ex:22: Slax.Scheduler.send_repo_to_channel/3
2019-06-06T14:25:00+00:00 slax[cmd.v46.zt79r]:     (elixir) lib/enum.ex:769: Enum."-each/2-lists^foreach/1-0-"/2
2019-06-06T14:25:00+00:00 slax[cmd.v46.zt79r]: Function: #Function<7.52749505/0 in Quantum.Executor.run/4>
2019-06-06T14:25:00+00:00 slax[cmd.v46.zt79r]:     Args: []

Notes

  • Blockerbot is deployed already. Those logs are from production in the cluster.
  • You can manually run the command with /blocker latency

Scenario

  • WHEN it is 9:25am
  • THEN any channels with Blockerbot enabled should receive an update
@jpatricknola
Copy link
Contributor

i didnt have enough time to get set up but from looking at it I have some initial questions. With the knowledge that the scheduler works correctly in the local environment and not on production, I'm looking at two files in particular

  1. config.exs
  2. prod_runtime_config.exs

I'm wondering if the runtime file needs the equivalent of this from the config file, or if there is something not happening as expected with import_config

Screen Shot 2019-07-02 at 1 53 27 PM

@grossvogel
Copy link
Contributor

grossvogel commented Jul 2, 2019

I got a bit curious so I thought I'd drive by to offer what little insight I have. The error definitely seems to be triggered on this line and it looks like {"documentation_url", "https://developer.github.com/v3"} is appearing as one of the items in the issues parameter of the function. That doesn't look like an issue to me, so I think the caller of this function is misusing it.

Perhaps we have a single issue instead of the expected list of issues?
Or perhaps we have something else entirely that this function wasn't expecting?

@fireside68 fireside68 self-assigned this Jul 15, 2019
@jwietelmann
Copy link

Perhaps we have a single issue instead of the expected list of issues?

This is what I would investigate first. It makes the most sense with the symptoms.

@jwietelmann
Copy link

Actually, I think what's happening is there's an unhandled API error of some kind that we're eating. https://github.com/revelrylabs/slax/blob/master/lib/slax/github.ex#L162

@jwietelmann
Copy link

jwietelmann commented Jul 15, 2019

After a long pairing session with @fireside68 ...

Complete analysis

  • The GITHUB_API_TOKEN, an env var whose value should be set to a GitHub "Personal Access Token" (ideally of a readonly machine user), was never set.
  • The code for extracting that value from the app's configuration was never correct.
  • The API was returning an error response (because the request was not authenticated).
  • We were eating the error and silently returning the response body to a caller that was expecting a list of issues.
  • The caller had no guards to guarantee it was working with a list.
  • The caller function would then enumerate the key/value pairs of the error response.
  • Eventually it would raise a full-on runtime error when it reached code that depended on there being an issue map instead of a key/value tuple.

Solution

  • Generate a personal access token for the revelry-machine-readonly user, and set GITHUB_API_TOKEN to that value in deis: Done.
  • Fix the configuration fetching code: Unblocking blockerbot 🚘 #106
  • Add a guard to make sure the function that is supposed to operate on a list of issue maps has indeed received a list, so errors are raised earlier and more clearly: Unblocking blockerbot 🚘 #106

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants