Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] packaging.metadata.Metadata.from_raw(validate=True/False) may return empty result #851

Closed
wants to merge 4 commits into from

Conversation

DEKHTIARJonathan
Copy link

I found either a bug or a very unexpected results inside packaging.metadata.Metadata.from_raw(validate=True/False)

This function behaves totally different if validate is True. Which clearly does not make intuitive sense.

Here is a demonstration of the bug.

Long story short, if validate is True, from_raw() returns an empty dictionary. Which is incorrect.
getattr() is the part that actually cause this trouble. So removing it should fix the problem, I'm not entirely clear why it was here in the first place though.

https://colab.research.google.com/drive/151JEthOIXDKV1gdFLaJ-MFpPBFQbN1Q_#scrollTo=a54IxVQgS9HP

@DEKHTIARJonathan DEKHTIARJonathan changed the title [BugFix] metadata.py [BugFix] packaging.metadata.Metadata.from_raw(validate=True/False) may return empty result Oct 28, 2024
@brettcannon
Copy link
Member

Unfortunately you can't remove that call as that's how we trigger the validation code for each attribute. Plus you didn't update any tests which also means we don't know if the problem you're seeing was actually fixed.

Did you want to keep trying to fix it in this PR by adding a test or did you want to open an issue with a reproducer to at least track the bug?

@DEKHTIARJonathan
Copy link
Author

Oh I see so the bug might be at a different place ... I was wondering the purpose of that line.
Would you mind providing a "comment" I could insert above that getattr call to explain what it's doing.

I'll keep digging to find why it fails

Co-authored-by: Brett Cannon <brett@python.org>
@DEKHTIARJonathan
Copy link
Author

@brettcannon actually I just found the bug:

del instance._raw[self.name] # type: ignore[misc]

class _Validator(Generic[T]):
       [...]
       
    def __get__(self, instance: Metadata, _owner: type[Metadata]) -> T:
        [...]
      
        cache[self.name] = value
        try:
            del instance._raw[self.name]  # type: ignore[misc]
        except KeyError:
            pass

So it seems on purpose ... Can you explain maybe why it's explicitly deleted ?

@brettcannon
Copy link
Member

Oh I see so the bug might be at a different place ...

Yes.

Would you mind providing a "comment" I could insert above that getattr call to explain what it's doing.

I tried adding a comment, but since you can't comment on deleted lines I had to insert it the link above and I hope I got the dedenting correct.

I'll keep digging to find why it fails

👍 Having a test case will probably help. You can see what tests we have in https://github.com/pypa/packaging/blob/main/tests/test_metadata.py and where validate=True.

@DEKHTIARJonathan
Copy link
Author

@brettcannon actually just above I found the place responsible for that deletion. Though I'm confused why it was manually deleted. Any explanation ?

@brettcannon
Copy link
Member

brettcannon commented Oct 28, 2024

So it seems on purpose ... Can you explain maybe why it's explicitly deleted ?

Because you don't need the raw data anymore because the appropriate value is cached by the descriptor.

Did you actually try using the object and found none of the data was there? Or are you just assuming there's a problem simply because ._raw is empty? You are accessing a private attribute after all. 😉

@DEKHTIARJonathan
Copy link
Author

DEKHTIARJonathan commented Oct 28, 2024

To me there's two problems:

  1. Having an object behave a widely differently depending if the validation is executed is confusing.

    1. Either that's correct that ._raw should be empty and consequently should be wiped out if no validation
    2. Or that's not correct and there's a bug with that del ... call.
  2. I hear you on the private argument, but __dict__ is not a very clean interface (all the data is duplicated because _raw is not deleted:

# {'_raw': {},  # if validation is True

{'_raw': {
          'classifiers': ['Development Status :: 3 - Alpha',
                          'Intended Audience :: Developers',
                          'Topic :: Utilities',
                          'License :: OSI Approved :: Apache Software License',
                          'Operating System :: OS Independent',
                          'Programming Language :: Python :: 3',
                          '],
          'metadata_version': '2.1',
          'name': 'dummy_project',
 },
 'classifiers': ['Development Status :: 3 - Alpha',
                 'Intended Audience :: Developers',
                 'Topic :: Utilities',
                 'License :: OSI Approved :: Apache Software License',
                 'Operating System :: OS Independent',
                 'Programming Language :: Python :: 3',],
 'metadata_version': '2.1',
 'name': 'dummy_project'
 }

It's quite messy. I decided to access _raw because it was cleaner. But I agree.

Maybe the better way is to delete ins._raw all together at the end. I updated the PR in that direction.

@brettcannon
Copy link
Member

Having an object behave a widely differently depending if the validation is executed is confusing.

  1. Either that's correct that ._raw should be empty and consequently should be wiped out if no validation
  2. Or that's not correct and there's a bug with that del ... call.

Or 3. metadata is messy and not everyone cares whether all fields are valid, and so we make it lazy for for the subset of fields you care about.

This API was designed over a span of 18 months with a lot of input from users who work with metadata regularly, so this feature set is very much on purpose.

2. I hear you on the private argument, but __dict__ is not a very clean interface

Hence why Metadata is an object with attribute names, so I'm not understanding your point here. _raw is an implementation detail and thus you really shouldn't care what we do with it behind the scenes.

Maybe the better way is to delete ins._raw all together at the end.

You can't do that as you break lazy access and validation of fields. Please run the test suite locally to see what tests you are breaking with your proposed changes.

Because there isn't a bug here I'm going to close the PR. If there's an API question, please open an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants