Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception handler python hook #2496

Closed
nicku33 opened this issue May 28, 2020 · 5 comments
Closed

Exception handler python hook #2496

nicku33 opened this issue May 28, 2020 · 5 comments
Labels
enhancement New feature or request

Comments

@nicku33
Copy link

nicku33 commented May 28, 2020

Describe the feature

I'd like to be able to register a function to be called whenever an exception occurs with details about the context it occurred in as well as the raw Exception instance. This is so that we can pass it on to our alerting systems as soon as they happen, with the data structure of the actual exception. Rollbar, for example, uses the stack trace itself to fingerprint exceptions.

Describe alternatives you've considered

We can parse job output, but that seems fragile. As well, with long dbt jobs we have to wait for the end to get alert output. Getting it as they happen would be better.

Additional context

Any database, really.

Who will this benefit?

Data engineers and ops who have pre-existing error processing and logging systems.

@nicku33 nicku33 added enhancement New feature or request triage labels May 28, 2020
@nicku33
Copy link
Author

nicku33 commented May 28, 2020

(happy to work on this patch ourselves, and apologies if there is a way to do this already that I missed)

@drewbanin drewbanin removed the triage label May 28, 2020
@drewbanin
Copy link
Contributor

hey @nicku33 - I buy the use case, but I think we might want to opt for a different implementation than the one you're describing. Just an FYI you can run dbt with --log-format=json to emit json log lines. That might look like:

dbt --debug --log-format=json run

When you do that, any exceptions will be present in the log lines. I think you should find an exc_info key in a logline that contains the full exception and stack trace. I'd recommend piping that output into a process or script that fires logs off or triggers actions when errors occur. You buy that?

@nicku33
Copy link
Author

nicku33 commented May 28, 2020

OK. I'll try that, easier to parse json anyway. However will that emit at the time of exception or only at the end ?

Also, what different implementation where you thinking of ? You don't like code injection I take it ?

Slightly related, lmk if worth a ticket, but we've found a need to define define transient exceptions in our own script so we know when to retry. (tx conflicts, transient network issues. We're on Redshift now)

I think other have mentioned the issue. #1630

These could be hard coded by db, or possibly use https://en.wikipedia.org/wiki/SQLSTATE codes, but there is some judgement involved as well depending on the nature of the job. In our own scripts we've implemented an error handler which passes back whether the error is believed to be transient and how often to retry.

@drewbanin
Copy link
Contributor

hey @nicku33 - these exceptions should be logged immediately when they occur, not at the end of the run!

We don't really design for dbt's programmatic Python API, and while I do think we'll do a better job here in the future, we basically provide no guarantees or helpful interfaces to calling dbt from Python today. I do think that a Python hook for model completion (be it a success or error) would be a really natural part of this API when we do build it out though.

See our current thinking on retrying failed runs over here #2465

@nicku33
Copy link
Author

nicku33 commented Jun 2, 2020

Yes you are right, of course. We switched to json output and are parsing STDOUT as they output. A bit icky and I look forward to the API hook, but much better than before.

Regarding retry on transient issues, let me look closer at how dbt works and if I have an idea I'll fork, try it, and make an issue if it works. #2465 I guess would allow one to rerun just failed models until exit was non-zero ? Might be fine but if there's a more elegant way to retry earlier it would save some grief with dependent jobs.

Thanks for your response, it helped us. Closing the ticket.

@nicku33 nicku33 closed this as completed Jun 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants