-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-352] catch and retry malformed json #4982
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
core/dbt/clients/registry.py
Outdated
try: | ||
# the quotes in the response were converted to single quotes which is not | ||
# valid json. json.dumps() will fix that. | ||
json.loads(json.dumps(jsonData)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels bad. Any suggestions on a better way to do this welcome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment doesn't state this yet but the api/v1/index.json
path also returns a list objects that causes it fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait... I think it should be valid JSON... it's just a list... according to jsonlint dot com it's valid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtcohen6 my understanding is json.resp()
returns a json object which really means it's returning a python list in this case. So json.loads(jsonData)
freaks out with TypeError: the JSON object must be str, bytes or bytearray, not list
. I've updated all this confusing logic I added to just straight check if json.resp() is a list or dict. That's all we ever expect it to be. Since we control the HUB and the only time it's not one of those two things something else is going wrong, this is safe. It's also much clearer as to what's happening.
core/dbt/clients/registry.py
Outdated
expected_type = dict | ||
expected_keys = ["name", "packages", "downloads"] | ||
return _get_with_retries( | ||
"api/v1/{}/{}.json".format(name, version), registry_base_url, expected_type, expected_keys | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can avoid a lot of superfluous requests by making this a filter on the (cached) package
request response, rather than a separate request. This is what I'm trying over in #4983:
dbt-core/core/dbt/clients/registry.py
Lines 85 to 87 in 25d4c96
def package_version(name, version, registry_base_url=None): | |
response = package(name, registry_base_url) | |
return response["versions"][version] |
But we're not validating ["name", "packages", "downloads"]
as expected keys of the package
response there, just the presence of a versions
key (which contains a list of version dictionaries, each of which should contain those keys).
Maybe we should take a hybrid approach like:
- First try grabbing the specific version info from the (cached)
package
response - Check to ensure that the expected keys are present (
["name", "packages", "downloads"]
) - If they're missing for some reason, fall back to making the current version-specific request, with retries:
"api/v1/{}/{}.json"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check to ensure that the expected keys are present (["name", "packages", "downloads"])
Another way of doing this: RegistryPackageMetadata.from_dict()
here, on the specific version response, instead of in dbt.deps.registry._fetch_metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me! Added a bunch of optional suggestions as well.
core/dbt/clients/registry.py
Outdated
error_msg = ( | ||
f"Request error: The response type of {type(response)} is not valid: {resp.text}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's put what we expected in this error message
core/dbt/clients/registry.py
Outdated
|
||
def package(name, registry_base_url=None): | ||
response = _get_with_retries("api/v1/{}.json".format(name), registry_base_url) | ||
expected_keys = ["name", "versions"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we seem to only use this as a set, maybe we could just create it as a set using {}
instead of []
core/dbt/clients/registry.py
Outdated
if not set(expected_keys).issubset(response) and not set(expected_version_keys).issubset( | ||
response["versions"] | ||
): | ||
error_msg = f"Request error: Expected the response to contain keys {expected_keys} but one or more are missing: {resp.text}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] we can use set diff to discover and explicitly report these keys. There's only 3 keys right now so if we don't want to bother I'm fine with that too.
def _get(path, registry_base_url=None): | ||
url = _get_url(path, registry_base_url) | ||
def _get(package_name, registry_base_url=None): | ||
url = _get_url(package_name, registry_base_url) | ||
fire_event(RegistryProgressMakingGETRequest(url=url)) | ||
resp = requests.get(url, timeout=30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this line throw at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But we're catching all exceptions from requests in the retry logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets add a comment to that effect then.
core/dbt/clients/registry.py
Outdated
def package(package_name, registry_base_url=None): | ||
# returns a dictionary of metadata for all versions of a package | ||
response = _get_with_retries(package_name, registry_base_url) | ||
return response["versions"] | ||
|
||
|
||
def package_version(package_name, version, registry_base_url=None): | ||
# returns the metadata of a specific version of a package | ||
response = package(package_name, registry_base_url) | ||
return response[version] | ||
|
||
|
||
def get_available_versions(package_name): | ||
# returns a list of all available versions of a package | ||
response = package(package_name) | ||
return list(response) | ||
|
||
|
||
def _get_index(registry_base_url=None): | ||
|
||
url = _get_url("index", registry_base_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like these functions and how they abstract over retries. Can we add type hints to them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! This is a huge improvement
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.0.latest 1.0.latest
# Navigate to the new working tree
cd .worktrees/backport-1.0.latest
# Create a new branch
git switch --create backport-4982-to-1.0.latest
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick --mainline 1 7f953a6d48a4db254b17fd3929d267e4b6069228
# Push it to GitHub
git push --set-upstream origin backport-4982-to-1.0.latest
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.0.latest Then, create a pull request where the |
resolves #4849, resolves #4342
Description
There are still some exceptions when trying to install deps that fail without retrying. This PR tries to catch the rest of them.
It also caches the HUB response to reduce calls. See #4983 for some more detailed context.
Checklist