Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attaching to Batch job that failed to start results in ResourceNotFoundException #405

Closed
joverlee521 opened this issue Oct 31, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@joverlee521
Copy link
Contributor

Current Behavior

Slack context

Attaching to a Batch job that was submitted but failed to start results in an uncaught error

$ nextstrain build --aws-batch --attach d3ab7b14-e7ca-4795-a9d5-29281b00acaa .
Attaching to Nextstrain AWS Batch Job ID: d3ab7b14-e7ca-4795-a9d5-29281b00acaa
Job is FAILED
Traceback (most recent call last):
  File "runpy", line 196, in _run_module_as_main
  File "runpy", line 86, in _run_code
  File "nextstrain.cli.__main__", line 43, in <module>
  File "nextstrain.cli.__main__", line 19, in main
  File "nextstrain.cli", line 37, in run
  File "nextstrain.cli.command.build", line 300, in run
  File "nextstrain.cli.runner", line 290, in run
  File "nextstrain.cli.runner.aws_batch", line 407, in run
  File "nextstrain.cli.runner.aws_batch.jobs", line 125, in log_entries
  File "nextstrain.cli.runner.aws_batch.logs", line 36, in fetch_stream
  File "botocore.paginate", line 269, in __iter__
  File "botocore.paginate", line 357, in _make_request
  File "botocore.client", line 569, in _api_call
  File "botocore.client", line 1023, in _make_api_call
botocore.errorfactory.ResourceNotFoundException: An error occurred (ResourceNotFoundException) when calling the FilterLogEvents operation: The specified log stream does not exist.

Expected behavior

Handle the unexpected ResourceNotFoundException and report the underlying failure reason of the job.

@joverlee521 joverlee521 added the bug Something isn't working label Oct 31, 2024
@joverlee521
Copy link
Contributor Author

Unhandled error because ResourceNotFoundException is not one of the expected errors in the LogWatcher

except (ClientError, BotocoreConnectionError):

@joverlee521
Copy link
Contributor Author

Nvm, this is because the exception is raised further down

except (ClientError, BotocoreConnectionError):
failure_count += 1
if failure_count > MAX_FAILURES and not success_count:
raise

But then is not handled outside?

@joverlee521
Copy link
Contributor Author

Hah, I was completely off in tracking down the error here 🤦‍♀️

This will be resolved by #406

@tsibley
Copy link
Member

tsibley commented Nov 1, 2024

Ahh, I didn't see this issue thread until just now! (well after diagnosing and making the fix)

@tsibley tsibley closed this as completed Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants