-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove numeric underscore normalization #549
Comments
Just as a changelog for this issue: I've changed the title and provided alternative solutions other than just removing the option (added the 'Possible fixes' section.) |
I suspect I may have missed the original issue where this would have been more appropriate to bring up, but if this is being considered for reversal or modification, I'd like to point out that not all numbers have the same, shall we say, texture. For example, there's nothing locale-specific about this number:
which black by default now helpfully "fixes" to
|
Another example of not all numbers having the same texture: Inserting underscores into integers feels odd to me at times when the integer in question is a hardcoded ID. For example, I'd prefer to see |
I think in corner cases (this is not happening too often, or I'm not aware of something?) such as I would argue removal of Anyway, even as an "opponent" of this normalization, if it stays I still think it shouldn't be optional because of the reasons stated by @underyx . |
Here is a link to discussion which happened when the option was first introduced: #529 |
I like fix option 2 - "Remove the whole feature of numeric-underscore-normalization; if it's not a universally enforceable standard, maybe it has no right to be a part of black." There are clear examples documented above where numeric underscores not only don't make sense but obfuscate the intent of the code. The first time Black reformatted some code with them I pushed the PR & saw the underscores in the numeric literals and was confused where they'd come from. It wasn't until a couple days later I had even heard that underscores in numeric literals were allowed now in Python 3.6 so at first I thought it was a syntax error. I see them as being much like automatic string concatenation, these two lines are equivalent:
If I add the parenthesis & split a string literal into multiple lines Black is absolutely permitted to rearrange the surrounding whitespace for consistency. However, It'd be rather invasive of Black to break the literal up in the first place even though it violates the 88 char line rule. Similarly for numeric literals, if I choose to put underscores in them then I'd expect Black to respect that & leave them alone because changing them likely changes what I'm intending to express by putting them in. If I leave underscores out, then it's probable that's because my intent is to not make the numeric literal look like a western-style value with commas every 3 digits because of what the value represents (ex: a user id, or a date, etc) |
I also agree with option 2: "Remove the whole feature of numeric-underscore-normalization; if it's not a universally enforceable standard, maybe it has no right to be a part of black." While the philosophy of Black is to be uncompromising on the things it decides to enforce, that doesn't mean that it has to enforce all aspects of your code. With most decisions it chooses to make are neither here nor there, the feature as it stands can make it much harder for teams to read the code: either because of the poor date format support (and what about telephone numbers literals??) or because of localisation. I can see that being a barrier to adoption for Black. |
I mentioned earlier, but not in this thread: I consider altering numeric literals to be the same as altering a quoted string. Why would you do it? Even if one could determine the correct context (date? locale?) which one can't, one can't determine the human intent. |
Is this option still available? I don't see it in the command line options section of the docs. I'm getting used to using black for formatting and I have gone all in on one project, but I just noticed the underscores in the numbers and wow, that's weird. It's the first time I've seen this, and I'd like to omit this formatting if possible. My understanding is that you can configure black's formatting behavior by adding entries into the black section of the |
@monocongo If you want to contribute some arguments for and against numeric underscore that are more substantial than "that's weird" (that is, e.g. why you think its weird, other than "not what I'm used to") then please do so here, but otherwise a new issue will be more appropriative as this one seems to be dedicated to discussion if this option is worth keeping or should it be removed - and if that happens, what a strict default should be. In this particular issue there are some voices that this particular feature has does not deliver a good results for couple of common cases (searching for a number in code base, unix timestamp formatting) and less common ones (AFAIK - date as a number; some kind of ID as a number). |
It's weird because it detracts from readability, IMHO. It breaks things like when I have numbers that are meaningful such as 20180711 (for July 11th, 2018), it turns that into 201_807_11 (and I'd expect the grouping to start from the other end, i.e. 20_180_711, since that's more akin to 20,180,711). I can see how this may provide benefit in some cases, but in my opinion, it's taking things a bit too far. (Of course you can get used to anything with enough exposure, it just took me aback when I first saw it, I wasn't aware of PEP515.) It's not my call, I'm just glad that this is one of the few options where users have discretion, hence my comment in support of keeping this option available and for confirmation that it's still in play. |
Black does format |
Thanks for the follow up/clarification. You're right, I was mistaken. The grouping by three goes in the opposite direction to the right of the decimal point, and logically so. I shouldn't have commented about the behavior from a (mistaken) extrapolation. |
For the record, I also agree this rule seems weird and wrong to apply globally. Numeric underscores generally only apply in cases where the number is a "natural" number, but that is not the sole purpose of all numbers in Python nor are the cases where it's not the sole purpose, obscure. |
It would be love to see a source for this statement. Links to some opensource projects which have examples of these would be great help, at least then we can try to estimate how common these practices are. |
At the doctor so no links. But the place I’ve personally ran into this has been primarily in tests, where I have specific ID, dates, time stamps, etc written as hardcoded literals.
I use hardcoded literals in tests way more often than anywhere else tbh.
…Sent from my iPhone
On Nov 19, 2018, at 2:08 PM, rooterkyberian ***@***.***> wrote:
nor are the cases where it's not the sole purpose, obscure.
It would be love to see a source for this statement.
How many hard coded number literals that are not "natural" numbers are in the code base? We have listed issue numbers (which personally I have never seen as python literals), unix timestamps (but how often we hardcode these?), and dates as a number (do we have some examples where this used in the wild?).
Links to some opensource projects which have examples of these would be great help, at least then we can try to estimate how common these practices are.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
We skip numeric normalization at Instagram. Some data analysis found that a sizable majority of the numeric constants in our codebase were object ids for which greppability far outweighs formatting as a natural number. We also found other cases such as phone numbers and timestamps. My opinion is that numeric formatting is out of scope for a formatter, because it is impossible to do correctly without understanding the meaning of the numeric literal. Asserting that all numeric literals "should" represent natural numbers is an opinionated bridge too far; that's no longer an issue of code formatting at all. |
Great points all around. I think it's safe to say we have overwhelming support of removal of the feature, so I have updated my issue title and description to reflect this. |
I don't understand the logic behind the feature. Number without underscore will not be formatted by black, while number with pre-defined underscore will be formatted.
What I expected was the opposite situation. |
@yech1990 what you have mentioned here is due |
Both are in the same file. I use python 3.7 |
The option used to preserve underscores in numeric literals has been removed from Black. Black no longer normalizes numeric literals to include _ separators. See: psf/black#549 psf/black#696 Signed-off-by: Enrico Usai <usai@amazon.com>
The option used to preserve underscores in numeric literals has been removed from Black. Black no longer normalizes numeric literals to include _ separators. See: psf/black#549 psf/black#696 Signed-off-by: Enrico Usai <usai@amazon.com>
Below is the original issue, which was arguing for "doing something about
--skip-numeric-underscore-normalization
." Since then, a consensus seems to have formed amongst users that the who feature should just be removed, because grouping by threes is not appropriate for numeric literals such as object IDs, phone numbers, dates, and much more. Please read the comments for a more detailed explanation.Hey! As I understand, this option was added to keep the possibility to write 'localized' code, respecting the developers' regional numeric format.
Localized code formatting
I disagree with the premise that this is something to be considered. Even though my local date format is not ISO, I'm using
YYYY-MM-DD
in code. Even though my local format marks decimal places with a comma, I'm fine with writing3.14
instead of3,14
.The black philosophy
The README makes some promises:
Adding options like this, while having only a minor effect, goes directly against these promises. Each option is a new discussion to have. Each option is something new to discover: consider if I want to make a quick edit to a constant in someone's blackened open source project on the github.com in-browser editor. I know how to write black-compliant code, so I can risk not cloning the project and committing something without running black. But now that this option is there, I either need to look for the project's config first to see their preference for literals, or guess and risk a CI failure wasting my time.
And yes, I honestly believe this is a slippery slope.
You might say, well, that's okay, everyone using black would just use the defaults unless they are in an aforementioned non-three digit grouping region, right? Well…
Developers are a lost cause
We had a team at my company bump the black version a couple days ago. I glanced at their project today and was horrified to find
skip-numeric-underscore-normalization = true
in their config. Apparently they saw the option in the changelog andReally, developers are a lost cause when it comes to code style arguments. Just don't let them have them, please.
Possible fixes
Remove the
--skip-numeric-underscore-normalization
option.Remove the whole feature of numeric-underscore-normalization; if it's not a universally enforceable standard, maybe it has no right to be a part of black.
Rewrite the style of documentation of these options. Mention restrictive rules on when you as a developer are allowed to use it. Something like this:
The text was updated successfully, but these errors were encountered: