Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hostname format checker might raise ValueError #1121

Open
jvtm opened this issue Jul 10, 2023 · 9 comments
Open

hostname format checker might raise ValueError #1121

jvtm opened this issue Jul 10, 2023 · 9 comments
Labels
Bug Something doesn't work the way it should. Needs Test Upstream Issues that need to have a test added to https://github.com/json-schema-org/JSON-Schema-Test-Suite

Comments

@jvtm
Copy link
Contributor

jvtm commented Jul 10, 2023

At least on empty string format: hostname validator raises ValueError and not the expected ValidationError.

>>> from jsonschema.validators import validator_for
>>> schema = {"$schema": "https://json-schema.org/draft/2020-12/schema", "type": "string", "format": "hostname"}
>>> vcls = validator_for(schema)
>>> validator = vcls(schema, format_checker=vcls.FORMAT_CHECKER)
>>> validator.validate("")
....
    for error in errors:
  File ".../jsonschema/_validators.py", line 238, in format
    validator.format_checker.check(instance, format)
  File ".../jsonschema/_format.py", line 135, in check
    result = func(instance)
             ^^^^^^^^^^^^^^
  File ".../jsonschema/_format.py", line 276, in is_host_name
    return FQDN(instance).is_valid
           ^^^^^^^^^^^^^^
  File ".../fqdn/__init__.py", line 44, in __init__
    raise ValueError("fqdn must be str")
ValueError: fqdn must be str

This also fails on iter_errors() usage.

This might be enough to fix it:

--- a/jsonschema/_format.py
+++ b/jsonschema/_format.py
@@ -270,6 +270,7 @@ with suppress(ImportError):
         draft7="hostname",
         draft201909="hostname",
         draft202012="hostname",
+        raises=ValueError,
     )
     def is_host_name(instance: object) -> bool:
         if not isinstance(instance, str):

Sorry, no time to properly test + send a PR at this point in time.

@Julian
Copy link
Member

Julian commented Jul 11, 2023

Thanks, indeed the first step here is to add a test to the official test suite. Help welcome whenever you care to.

@Julian Julian added the Bug Something doesn't work the way it should. label Jul 11, 2023
jvtm added a commit to jvtm/JSON-Schema-Test-Suite that referenced this issue Jul 13, 2023
Assert that hostname format validation fails gracefully on empty strings.

This is especially for Python `jsonschema` library that raises an unexpected
ValueError exception on `hostname` check (python-jsonschema/jsonschema#1121).

Adds similar test to:
 * draft3: host-name
 * draft4: hostname
 * draft6: hosntame
 * draft7: hostname, idn-hostname
 * draft2019-09: hostname, idn-hostname
 * draft2020-12: hostname, idn-hostname
 * draft-next: hostname, idn-hostname
jvtm added a commit to jvtm/JSON-Schema-Test-Suite that referenced this issue Jul 13, 2023
Assert that hostname format validation fails gracefully on empty strings.

This is especially for Python `jsonschema` library that raises an unexpected
ValueError exception on `hostname` check (python-jsonschema/jsonschema#1121).

Adds similar test for:
 * draft3: host-name
 * draft4: hostname
 * draft6: hosntame
 * draft7: hostname, idn-hostname
 * draft2019-09: hostname, idn-hostname
 * draft2020-12: hostname, idn-hostname
 * draft-next: hostname, idn-hostname
jvtm added a commit to jvtm/jsonschema that referenced this issue Jul 13, 2023
Doing hostname format check on empty string seems to raise a ValueError:

    >>> from jsonschema.validators import validator_for
    >>> schema = {"$schema": "https://json-schema.org/draft/2020-12/schema", "type": "string", "format": "hostname"}
    >>> vcls = validator_for(schema)
    >>> validator = vcls(schema, format_checker=vcls.FORMAT_CHECKER)
    >>> list(validator.iter_errors(""))
    ...
      File "lib/python3.10/site-packages/jsonschema/_format.py", line 276, in is_host_name
        return FQDN(instance).is_valid
      File "lib/python3.10/site-packages/fqdn/__init__.py", line 44, in __init__
        raise ValueError("fqdn must be str")
    ValueError: fqdn must be str

Fix by adding `raises=ValueError` to the related `@_checks_drafts` decorator call.

See also json-schema-org/JSON-Schema-Test-Suite#677.

Fixes python-jsonschema#1121.
@jvtm
Copy link
Contributor Author

jvtm commented Jul 13, 2023

Discussion in json-schema-org/JSON-Schema-Test-Suite#677 pointed out that zero length strings are actually valid in the scope of given RFC's. So, my initial easy PR needs a bit more work.

As the zero length / empty string is valid, also idn-hostname fails in at least the following scenarios (using the underlying idna library for some quick checks):

>>> import idna
>>> idna.encode("")
idna.core.IDNAError: Empty domain
>>> idna.encode(".")
idna.core.IDNAError: Empty Label

To be continued...

@jvtm
Copy link
Contributor Author

jvtm commented Jul 13, 2023

@Julian I think the easiest patch for now is to do a fast pre-check / return True on special cases on jsonschema, before calling the external libraries.

Something like

if instance in ("", "."):
    return True

At the moment idna raises exceptions on those too legit cases, and fqdn gives out the ValueError.

Even with that fix I think it makes sense to add raises=ValueError to the hostname / fqdn check.

🤔

@Julian
Copy link
Member

Julian commented Jul 13, 2023

I think the easiest patch for now is to do a fast pre-check

Yep that seems right to me.

Even with that fix I think it makes sense to add raises=ValueError to the hostname / fqdn check.

I'm somewhat OK with this but can you try to read through the source code of those libs and see whether there appear to be any explicit code paths which will really hit that? As I say I'm probably OK adding it regardless, but it's a bit nice to be thorough while we're here.

@jvtm
Copy link
Contributor Author

jvtm commented Jul 13, 2023

And then there is the Python standard lib idna encoding... oh my 🤯

>>> "hello".encode("idna")
b'hello'

>>> "hello.".encode("idna")
b'hello.'

>>> "".encode("idna")
b''

>>> ".".encode("idna")
Traceback (most recent call last):
  File "lib/python3.11/encodings/idna.py", line 163, in encode
    raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long)

I need a short break from all this... maybe resuming tomorrow.

@Julian Julian added the Needs Test Upstream Issues that need to have a test added to https://github.com/json-schema-org/JSON-Schema-Test-Suite label Jan 29, 2024
@shane-kearns
Copy link

I have encountered this regression when upgrading from 3.0.1 to 4.21.1 (a big jump as older versions don't support python 3.12)
Where the schema includes:

"address": {
    "type": "string",
    "anyOf": [
        {
            "format": "hostname"
        },
        {
            "format": "ipv4"
        },
        {
            "format": "ipv6"
        }
    ]
},

Previously passing an empty string would have resulted in a 'format' error, it now leaks the ValueError exception instead of being able to enumerate all the errors in the document being validated.

    errors = list(self.VALIDATOR.iter_errors(spec))
tests\.virtualenv\Lib\site-packages\jsonschema\validators.py:371: in iter_errors
    for error in errors:
tests\.virtualenv\Lib\site-packages\jsonschema\_keywords.py:296: in properties
    yield from validator.descend(
tests\.virtualenv\Lib\site-packages\jsonschema\validators.py:419: in descend
    for error in errors:
tests\.virtualenv\Lib\site-packages\jsonschema\_keywords.py:296: in properties
    yield from validator.descend(
tests\.virtualenv\Lib\site-packages\jsonschema\validators.py:419: in descend
    for error in errors:
tests\.virtualenv\Lib\site-packages\jsonschema\_keywords.py:296: in properties
    yield from validator.descend(
tests\.virtualenv\Lib\site-packages\jsonschema\validators.py:419: in descend
    for error in errors:
tests\.virtualenv\Lib\site-packages\jsonschema\_keywords.py:296: in properties
    yield from validator.descend(
tests\.virtualenv\Lib\site-packages\jsonschema\validators.py:419: in descend
    for error in errors:
tests\.virtualenv\Lib\site-packages\jsonschema\_keywords.py:340: in anyOf
    errs = list(validator.descend(instance, subschema, schema_path=index))
tests\.virtualenv\Lib\site-packages\jsonschema\validators.py:419: in descend
    for error in errors:
tests\.virtualenv\Lib\site-packages\jsonschema\_keywords.py:226: in format
    validator.format_checker.check(instance, format)
tests\.virtualenv\Lib\site-packages\jsonschema\_format.py:137: in check
    result = func(instance)
tests\.virtualenv\Lib\site-packages\jsonschema\_format.py:277: in is_host_name
    return FQDN(instance, min_labels=1).is_valid
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _


    def __init__(self, fqdn, *nothing, **kwargs):
        if nothing:
            raise ValueError("got extra positional parameter, try kwargs")
        unknown_kwargs = set(kwargs.keys()) - {"allow_underscores", "min_labels"}
        if unknown_kwargs:
            raise ValueError("got extra kwargs: {}".format(unknown_kwargs))

        if not (fqdn and isinstance(fqdn, str)):
>           raise ValueError("fqdn must be str")
E           ValueError: fqdn must be str

I've written a longer comment on the tests as to what I think the behaviour should be, but not "crashing" (leaking exceptions) when enumerating errors would be a good start here.

@shane-kearns
Copy link

I believe that ypcrts/fqdn#45 is the root cause, but catching a ValueError defensively here is probably the right thing to do.

@Julian
Copy link
Member

Julian commented Mar 4, 2024

Thanks for filing that. Yeah I'm happy with catching the ValueError in the interim (but I'd wait till the test is merged upstream).

@sirosen
Copy link
Member

sirosen commented Jan 17, 2025

I accidentally pushed a commit which I worked up to resolve this to the wrong place and it closed this (smart-commit). 🤦

Sorry about the noise.

@sirosen sirosen reopened this Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something doesn't work the way it should. Needs Test Upstream Issues that need to have a test added to https://github.com/json-schema-org/JSON-Schema-Test-Suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants