You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When downloading the tar file from github, it is possible for the tar file to be malformed or incomplete (the likely most common case is that the tar file in the local filesystem actually contains an error message instead of it being a real tar file) and when this happens deps will throw the following error:
Traceback (most recent call last):
File "/usr/src/app/sinter/clients/dbt.py", line 1200, in call
dbt_main.handle(command + extra_args)
File "/usr/local/lib/python3.8/dist-packages/dbt/main.py", line 159, in handle
res, success = handle_and_check(args)
File "/usr/local/lib/python3.8/dist-packages/dbt/main.py", line 205, in handle_and_check
task, res = run_from_args(parsed)
File "/usr/local/lib/python3.8/dist-packages/dbt/main.py", line 258, in run_from_args
results = task.run()
File "/usr/local/lib/python3.8/dist-packages/dbt/task/deps.py", line 66, in run
package.install(self.config, renderer)
File "/usr/local/lib/python3.8/dist-packages/dbt/deps/registry.py", line 74, in install
system.untar_package(tar_path, deps_path, package_name)
File "/usr/local/lib/python3.8/dist-packages/dbt/clients/system.py", line 489, in untar_package
with tarfile.open(tar_path, 'r') as tarball:
File "/usr/lib/python3.8/tarfile.py", line 1608, in open
raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully
This typically is indicative of a transient problem with github itself but in the context of dbt Cloud will cause a run to fail when a retry could possibly succeed.
Expected Behavior
Identify the bad tar file and retry the download, perhaps by checking for a successful list (tar -tzf foo.tar.gz >/dev/null or something similar) or, rather than spend 2x the effort (check, then untar) simply handle the failure to untar.
Optionally log a certain number of trailing bytes/characters in the case where the tar file is bad so we can understand, when this happens, whether it's just a truncated tar (download is being cut off) or something like an error message in a json response being pumped into the tar file since, I think, in the download iterating over what github is giving us and dropping it into a .tar.gz file.
Steps To Reproduce
No response
Relevant log output
No response
Environment
No response
What database are you using dbt with?
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered:
github-actionsbot
changed the title
[Bug] dbt deps does not handle bad git tarball download
[CT-51] [Bug] dbt deps does not handle bad git tarball download
Jan 17, 2022
Thanks so much for the detailed issue @barberscott!
This is definitely something we want to get resolved. We've added it as a task for the 1.0.2 release and I'm actively working on resolving it. I'm going to add in some retry logic in the least. I'm going to think on how/if logging any of the tarfile seems like the right solution.
Is there an existing issue for this?
Current Behavior
When downloading the tar file from github, it is possible for the tar file to be malformed or incomplete (the likely most common case is that the tar file in the local filesystem actually contains an error message instead of it being a real tar file) and when this happens deps will throw the following error:
This typically is indicative of a transient problem with github itself but in the context of dbt Cloud will cause a run to fail when a retry could possibly succeed.
Expected Behavior
Identify the bad tar file and retry the download, perhaps by checking for a successful list (tar -tzf foo.tar.gz >/dev/null or something similar) or, rather than spend 2x the effort (check, then untar) simply handle the failure to untar.
Optionally log a certain number of trailing bytes/characters in the case where the tar file is bad so we can understand, when this happens, whether it's just a truncated tar (download is being cut off) or something like an error message in a json response being pumped into the tar file since, I think, in the download iterating over what github is giving us and dropping it into a .tar.gz file.
Steps To Reproduce
No response
Relevant log output
No response
Environment
No response
What database are you using dbt with?
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered: