Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn for string cmp with new string warning assumptions #34

Conversation

nanjekyejoannah
Copy link

No description provided.

@ltratt ltratt self-assigned this Dec 15, 2023
@@ -1209,8 +1209,11 @@ string_richcompare(PyStringObject *a, PyStringObject *b, int op)
PyObject *result;

/* Make sure both arguments are strings. */
if (!(PyString_Check(a) && PyString_Check(b))) {
if (!(PyString_Check(a) && PyString_Check(b)) ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the extra spaces are intended?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spaces removed

result = Py_NotImplemented;
if (PyErr_WarnPy3k("comparing unicode and byte strings has different semantics in 3.x: convert the second string to byte.", 1) < 0) {
goto out;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goto out indented 1 level too many?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the condition to the existing if.

result = Py_NotImplemented;
if (PyErr_WarnPy3k("comparing unicode and byte strings has different semantics in 3.x: convert the second string to byte.", 1) < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure but won't this warning trigger if b is anything but a string (i.e. not just bytes)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope this test clears any doubt:

>>> "test str" == u"test unicode"
__main__:1: DeprecationWarning: comparing unicode and byte strings has different semantics in 3.x: convert the second string to byte.
False
>>> u"test str" == "test unicode"
__main__:1: DeprecationWarning: comparing unicode and byte strings has different semantics in 3.x: convert the first string to bytes.
False
>>> "test str" == "test unicode"
False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And just to check: things like "abc" == 123 and "abc" == object() don't go through this code path?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do actually with this logic, me checks notes on google drive again, isnt this the whole point of our assumption on tracking byteness? Please correct me if I confused something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember what we decided :) But partly because the warning says "comparing unicode and byte strings" it might be odd if the user gets that message when they've compared unicode and (say) an integer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess (but it is a guess!) is that if comparing "unicode with non-{unicode, bytes}" we don't want to print anything out, because such comparisons don't have changed semantics?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After doing some tests, I agree, applies to unicode case below too I guess.
Will modify the check for both.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excluded these other types:

>>> "abc" == 123
False
>>> "abc" == object()
False
>>> u"abc" == 123
False
>>> u"abc" == object()
False
>>> u"test str" == "test unicode"
__main__:1: DeprecationWarning: comparing unicode and byte strings has different semantics in 3.x: convert the first string to bytes.
False
>>> "test str" == u"test unicode"
__main__:1: DeprecationWarning: comparing unicode and byte strings has different semantics in 3.x: convert the second string to byte.
False

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an indirect call to Unicode compare through string compare though. I followed this and the only lead I have is the unicodecmp slot. I dont want to investigate it now but I have put it to my todo list. Slots are fragile and brittle, so taking this as design constraint for now until I get through some urgent things. My next string PR will hopefully handle, but there is a fix I first want to get in.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See test to understand this.

@nanjekyejoannah nanjekyejoannah force-pushed the warn_string_cmp_with_recent_assumptions branch from 0bee79a to 21b3ff7 Compare December 25, 2023 16:41
@nanjekyejoannah
Copy link
Author

If this is in good shape, you can help merge so that I submit another fix PR.

@ltratt
Copy link
Member

ltratt commented Dec 25, 2023

One comment (#34 (comment)) and then we're probably ready to go.

@ltratt
Copy link
Member

ltratt commented Dec 27, 2023

Please squash.

@nanjekyejoannah nanjekyejoannah force-pushed the warn_string_cmp_with_recent_assumptions branch from 39ad149 to 29ae1a9 Compare December 30, 2023 02:31
@nanjekyejoannah
Copy link
Author

Done

@ltratt ltratt added this pull request to the merge queue Dec 30, 2023
Merged via the queue into softdevteam:regression_fix with commit bb5d72c Dec 30, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants