Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove numeric underscore normalization #549

Closed
underyx opened this issue Oct 3, 2018 · 22 comments
Closed

Remove numeric underscore normalization #549

underyx opened this issue Oct 3, 2018 · 22 comments

Comments

@underyx
Copy link

underyx commented Oct 3, 2018

Below is the original issue, which was arguing for "doing something about --skip-numeric-underscore-normalization." Since then, a consensus seems to have formed amongst users that the who feature should just be removed, because grouping by threes is not appropriate for numeric literals such as object IDs, phone numbers, dates, and much more. Please read the comments for a more detailed explanation.


Hey! As I understand, this option was added to keep the possibility to write 'localized' code, respecting the developers' regional numeric format.

Localized code formatting

I disagree with the premise that this is something to be considered. Even though my local date format is not ISO, I'm using YYYY-MM-DD in code. Even though my local format marks decimal places with a comma, I'm fine with writing 3.14 instead of 3,14.

The black philosophy

The README makes some promises:

Black is the uncompromising Python code formatter. By using it, you agree to cede control over minutiae of hand-formatting. In return, Black gives you speed, determinism, […]. You will save time and mental energy for more important matters.

Blackened code looks the same regardless of the project you're reading. Formatting becomes transparent after a while and you can focus on the content instead.

Adding options like this, while having only a minor effect, goes directly against these promises. Each option is a new discussion to have. Each option is something new to discover: consider if I want to make a quick edit to a constant in someone's blackened open source project on the github.com in-browser editor. I know how to write black-compliant code, so I can risk not cloning the project and committing something without running black. But now that this option is there, I either need to look for the project's config first to see their preference for literals, or guess and risk a CI failure wasting my time.

And yes, I honestly believe this is a slippery slope.

You might say, well, that's okay, everyone using black would just use the defaults unless they are in an aforementioned non-three digit grouping region, right? Well…

Developers are a lost cause

We had a team at my company bump the black version a couple days ago. I glanced at their project today and was horrified to find skip-numeric-underscore-normalization = true in their config. Apparently they saw the option in the changelog and

  • held a poll on whether to keep this on or off
  • got a 50-50% split for/against in the results
  • said 'we may remove the option if there is consensus between our developers'

Really, developers are a lost cause when it comes to code style arguments. Just don't let them have them, please.

Possible fixes

  1. Remove the --skip-numeric-underscore-normalization option.

  2. Remove the whole feature of numeric-underscore-normalization; if it's not a universally enforceable standard, maybe it has no right to be a part of black.

  3. Rewrite the style of documentation of these options. Mention restrictive rules on when you as a developer are allowed to use it. Something like this:

    Configuring black

    Some special circumstances might require project specific configuration. But if you're using black, to not undermine all the benefits black gives you and the community, you must agree to only use these options if we permit it here.

    --skip-numeric-underscore-normalization

    You may use this if: Your codebase is worked on only by developers from regions where the local number format does not group by three digits.
    Effect: Numbers can be written both like 100_000_000 and 1_0000_0000

    --skip-string-normalization

    You may use this if: Your codebase predates black with pre-existing string conventions, and adoption would be unreasonably difficult if all strings had to instantly be reformatted.
    Effect: Strings can be written both like 'hello' and "hello"

@underyx underyx changed the title Remove --skip-numeric-underscore-normalization Remove the --skip-numeric-underscore-normalization option Oct 3, 2018
@underyx underyx changed the title Remove the --skip-numeric-underscore-normalization option Do something about the --skip-numeric-underscore-normalization option Oct 3, 2018
@underyx
Copy link
Author

underyx commented Oct 3, 2018

Just as a changelog for this issue: I've changed the title and provided alternative solutions other than just removing the option (added the 'Possible fixes' section.)

@glyph
Copy link

glyph commented Oct 3, 2018

I suspect I may have missed the original issue where this would have been more appropriate to bring up, but if this is being considered for reversal or modification, I'd like to point out that not all numbers have the same, shall we say, texture.

For example, there's nothing locale-specific about this number:

actually_a_date = 2018_05_01

which black by default now helpfully "fixes" to

actually_a_date = 20_180_501

@csvoss
Copy link

csvoss commented Oct 3, 2018

Another example of not all numbers having the same texture: Inserting underscores into integers feels odd to me at times when the integer in question is a hardcoded ID.

For example, I'd prefer to see Issue(issue_id=1142395) in my code than to see Issue(issue_id=1_142_395), because the former makes it easier to grep for 1142395 and because the latter causes me to mentally read out "issue id one million one hundred forty two thousand ..." which feels not quite right.

@rooterkyberian
Copy link

I think in corner cases (this is not happening too often, or I'm not aware of something?) such as actually_a_date = 2018_05_01 we can just disable formatting for particular line.

I would argue removal of numeric-underscore-normalization is the best course of action. It still is a very fresh Python feature, so I assume we have yet to explore how useful (or troublesome) it is and thus we shouldn't force it on anyone. The inability to search on number is especially worrying to me.
And to make a case against that it is given that black has to have some opinion, I would like to say, that even with things like line length, the issue always was not if we should limit it, just a particular number. Here we do not even know if we want to use underscores at all anywhere in the code.

Anyway, even as an "opponent" of this normalization, if it stays I still think it shouldn't be optional because of the reasons stated by @underyx .

@rooterkyberian
Copy link

Here is a link to discussion which happened when the option was first introduced: #529

@pzelnip
Copy link

pzelnip commented Oct 10, 2018

I like fix option 2 - "Remove the whole feature of numeric-underscore-normalization; if it's not a universally enforceable standard, maybe it has no right to be a part of black." There are clear examples documented above where numeric underscores not only don't make sense but obfuscate the intent of the code.

The first time Black reformatted some code with them I pushed the PR & saw the underscores in the numeric literals and was confused where they'd come from. It wasn't until a couple days later I had even heard that underscores in numeric literals were allowed now in Python 3.6 so at first I thought it was a syntax error.

I see them as being much like automatic string concatenation, these two lines are equivalent:

x = "this is a really really  really realllllllllly looooooooooooooooooooooooooooooooooooooooooong line"
x = (
    "this is a really really  really realllllllllly "
    "looooooooooooooooooooooooooooooooooooooooooong line"
)

If I add the parenthesis & split a string literal into multiple lines Black is absolutely permitted to rearrange the surrounding whitespace for consistency. However, It'd be rather invasive of Black to break the literal up in the first place even though it violates the 88 char line rule.

Similarly for numeric literals, if I choose to put underscores in them then I'd expect Black to respect that & leave them alone because changing them likely changes what I'm intending to express by putting them in. If I leave underscores out, then it's probable that's because my intent is to not make the numeric literal look like a western-style value with commas every 3 digits because of what the value represents (ex: a user id, or a date, etc)

@harrybiddle
Copy link

I also agree with option 2: "Remove the whole feature of numeric-underscore-normalization; if it's not a universally enforceable standard, maybe it has no right to be a part of black."

While the philosophy of Black is to be uncompromising on the things it decides to enforce, that doesn't mean that it has to enforce all aspects of your code. With most decisions it chooses to make are neither here nor there, the feature as it stands can make it much harder for teams to read the code: either because of the poor date format support (and what about telephone numbers literals??) or because of localisation. I can see that being a barrier to adoption for Black.

@ghost
Copy link

ghost commented Oct 26, 2018

I mentioned earlier, but not in this thread: I consider altering numeric literals to be the same as altering a quoted string. Why would you do it? Even if one could determine the correct context (date? locale?) which one can't, one can't determine the human intent.

@coxley
Copy link
Contributor

coxley commented Oct 26, 2018

First time I've noticed this but it breaks readability and usability of UNIX timestamps. Makes it impossible to double-check timestamps from code

$ date -d @1_534_291_200
date: invalid date ‘@1_534_291_200’

image

@monocongo
Copy link

monocongo commented Nov 11, 2018

Is this option still available? I don't see it in the command line options section of the docs. I'm getting used to using black for formatting and I have gone all in on one project, but I just noticed the underscores in the numbers and wow, that's weird. It's the first time I've seen this, and I'd like to omit this formatting if possible. My understanding is that you can configure black's formatting behavior by adding entries into the black section of the pyproject.toml file, and I have added skip-numeric-underscore-normalization = true. Is this all that's required, once this option is available?

@rooterkyberian
Copy link

rooterkyberian commented Nov 11, 2018

@monocongo
Yes it is available, see the readme or black --help.
When enabled it ignores whatever formatting for numbers is currently in the files. Hence, once formatted by whatever tool, after enabling --skip-numeric-underscore-normalization black won't make any alternation to the files anymore in this regard.
If you want to remove all underscores from your number literals from your code, you could try modifying black by adding .replace('_', '') at the end of https://github.com/ambv/black/blob/5c2dd96a69a935cf45acbdf2ffabbd39b27d38fa/black.py#L2603 . I haven't test it, but so try it at your own risk.

If you want to contribute some arguments for and against numeric underscore that are more substantial than "that's weird" (that is, e.g. why you think its weird, other than "not what I'm used to") then please do so here, but otherwise a new issue will be more appropriative as this one seems to be dedicated to discussion if this option is worth keeping or should it be removed - and if that happens, what a strict default should be.
black idea is to be opinionated tool, feeling of "weirdness" is unavoidable, for at least some of the users. What we can do about that, is ensure - by a rational dialog - that the formatting decisions driving black behavior prove a right choice for the most use cases.

In this particular issue there are some voices that this particular feature has does not deliver a good results for couple of common cases (searching for a number in code base, unix timestamp formatting) and less common ones (AFAIK - date as a number; some kind of ID as a number).
There is also whole issue of black idea being "no customization" and then, options meant for "easing old project conversion" being used in brand new projects.

@monocongo
Copy link

It's weird because it detracts from readability, IMHO. It breaks things like when I have numbers that are meaningful such as 20180711 (for July 11th, 2018), it turns that into 201_807_11 (and I'd expect the grouping to start from the other end, i.e. 20_180_711, since that's more akin to 20,180,711). I can see how this may provide benefit in some cases, but in my opinion, it's taking things a bit too far. (Of course you can get used to anything with enough exposure, it just took me aback when I first saw it, I wasn't aware of PEP515.) It's not my call, I'm just glad that this is one of the few options where users have discretion, hence my comment in support of keeping this option available and for confirmation that it's still in play.

@JelleZijlstra
Copy link
Collaborator

Black does format 20180711 as 20_180_711; if you see something else, please report this as a bug.

@monocongo
Copy link

Thanks for the follow up/clarification. You're right, I was mistaken. The grouping by three goes in the opposite direction to the right of the decimal point, and logically so. I shouldn't have commented about the behavior from a (mistaken) extrapolation.

@dstufft
Copy link

dstufft commented Nov 19, 2018

For the record, I also agree this rule seems weird and wrong to apply globally. Numeric underscores generally only apply in cases where the number is a "natural" number, but that is not the sole purpose of all numbers in Python nor are the cases where it's not the sole purpose, obscure.

@rooterkyberian
Copy link

nor are the cases where it's not the sole purpose, obscure.

It would be love to see a source for this statement.
How many hard coded number literals that are not "natural" numbers are in the code base? We have listed issue numbers (which personally I have never seen as python literals), unix timestamps (but how often we hardcode these?), and dates as a number (do we have some examples where this used in the wild?).

Links to some opensource projects which have examples of these would be great help, at least then we can try to estimate how common these practices are.

@dstufft
Copy link

dstufft commented Nov 19, 2018 via email

@carljm
Copy link
Collaborator

carljm commented Nov 19, 2018

We skip numeric normalization at Instagram. Some data analysis found that a sizable majority of the numeric constants in our codebase were object ids for which greppability far outweighs formatting as a natural number. We also found other cases such as phone numbers and timestamps.

My opinion is that numeric formatting is out of scope for a formatter, because it is impossible to do correctly without understanding the meaning of the numeric literal. Asserting that all numeric literals "should" represent natural numbers is an opinionated bridge too far; that's no longer an issue of code formatting at all.

@underyx underyx changed the title Do something about the --skip-numeric-underscore-normalization option Remove numeric underscore normalization Nov 19, 2018
@underyx
Copy link
Author

underyx commented Nov 19, 2018

Great points all around. I think it's safe to say we have overwhelming support of removal of the feature, so I have updated my issue title and description to reflect this.

@y9c
Copy link

y9c commented Nov 20, 2018

I don't understand the logic behind the feature.

Number without underscore will not be formatted by black, while number with pre-defined underscore will be formatted.

num = 100000000 ----black---> num=100000000
package_version = 2018_11 ----black---> package_version = 201_811

What I expected was the opposite situation.

@rooterkyberian
Copy link

rooterkyberian commented Nov 20, 2018

@yech1990 what you have mentioned here is due py36 support detection. If you had both lines in the same file, both of them would get formatted with underscores.
You can force black to assume py36 features are available by adding --py36 as a command line argument. This behavior is not specific to this feature - it also used with things like inserting trailing commas after **kwargs.

@y9c
Copy link

y9c commented Nov 20, 2018

Both are in the same file.

I use python 3.7

@rooterkyberian

enrico-usai added a commit to enrico-usai/aws-parallelcluster that referenced this issue Mar 15, 2019
The option used to preserve underscores in numeric literals
has been removed from Black.

Black no longer normalizes numeric literals to include _ separators.

See:
psf/black#549
psf/black#696
Signed-off-by: Enrico Usai <usai@amazon.com>
enrico-usai added a commit to aws/aws-parallelcluster that referenced this issue Mar 15, 2019
The option used to preserve underscores in numeric literals
has been removed from Black.

Black no longer normalizes numeric literals to include _ separators.

See:
psf/black#549
psf/black#696
Signed-off-by: Enrico Usai <usai@amazon.com>
laggron42 added a commit to laggron42/Laggrons-Dumb-Cogs that referenced this issue Mar 15, 2019
laggron42 added a commit to laggron42/Laggrons-Dumb-Cogs that referenced this issue May 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests