Remove numeric underscore normalization #549

underyx · 2018-10-03T12:19:23Z

Below is the original issue, which was arguing for "doing something about --skip-numeric-underscore-normalization." Since then, a consensus seems to have formed amongst users that the who feature should just be removed, because grouping by threes is not appropriate for numeric literals such as object IDs, phone numbers, dates, and much more. Please read the comments for a more detailed explanation.

Hey! As I understand, this option was added to keep the possibility to write 'localized' code, respecting the developers' regional numeric format.

Localized code formatting

I disagree with the premise that this is something to be considered. Even though my local date format is not ISO, I'm using YYYY-MM-DD in code. Even though my local format marks decimal places with a comma, I'm fine with writing 3.14 instead of 3,14.

The black philosophy

The README makes some promises:

Black is the uncompromising Python code formatter. By using it, you agree to cede control over minutiae of hand-formatting. In return, Black gives you speed, determinism, […]. You will save time and mental energy for more important matters.

Blackened code looks the same regardless of the project you're reading. Formatting becomes transparent after a while and you can focus on the content instead.

Adding options like this, while having only a minor effect, goes directly against these promises. Each option is a new discussion to have. Each option is something new to discover: consider if I want to make a quick edit to a constant in someone's blackened open source project on the github.com in-browser editor. I know how to write black-compliant code, so I can risk not cloning the project and committing something without running black. But now that this option is there, I either need to look for the project's config first to see their preference for literals, or guess and risk a CI failure wasting my time.

And yes, I honestly believe this is a slippery slope.

You might say, well, that's okay, everyone using black would just use the defaults unless they are in an aforementioned non-three digit grouping region, right? Well…

Developers are a lost cause

We had a team at my company bump the black version a couple days ago. I glanced at their project today and was horrified to find skip-numeric-underscore-normalization = true in their config. Apparently they saw the option in the changelog and

held a poll on whether to keep this on or off
got a 50-50% split for/against in the results
said 'we may remove the option if there is consensus between our developers'

Really, developers are a lost cause when it comes to code style arguments. Just don't let them have them, please.

Possible fixes

Remove the --skip-numeric-underscore-normalization option.
Remove the whole feature of numeric-underscore-normalization; if it's not a universally enforceable standard, maybe it has no right to be a part of black.
Rewrite the style of documentation of these options. Mention restrictive rules on when you as a developer are allowed to use it. Something like this:

Configuring black

Some special circumstances might require project specific configuration. But if you're using black, to not undermine all the benefits black gives you and the community, you must agree to only use these options if we permit it here.

--skip-numeric-underscore-normalization

You may use this if: Your codebase is worked on only by developers from regions where the local number format does not group by three digits.
Effect: Numbers can be written both like 100_000_000 and 1_0000_0000

--skip-string-normalization

You may use this if: Your codebase predates black with pre-existing string conventions, and adoption would be unreasonably difficult if all strings had to instantly be reformatted.
Effect: Strings can be written both like 'hello' and "hello"

The text was updated successfully, but these errors were encountered:

underyx · 2018-10-03T12:48:38Z

Just as a changelog for this issue: I've changed the title and provided alternative solutions other than just removing the option (added the 'Possible fixes' section.)

glyph · 2018-10-03T21:42:40Z

I suspect I may have missed the original issue where this would have been more appropriate to bring up, but if this is being considered for reversal or modification, I'd like to point out that not all numbers have the same, shall we say, texture.

For example, there's nothing locale-specific about this number:

actually_a_date = 2018_05_01

which black by default now helpfully "fixes" to

actually_a_date = 20_180_501

csvoss · 2018-10-03T22:33:40Z

Another example of not all numbers having the same texture: Inserting underscores into integers feels odd to me at times when the integer in question is a hardcoded ID.

For example, I'd prefer to see Issue(issue_id=1142395) in my code than to see Issue(issue_id=1_142_395), because the former makes it easier to grep for 1142395 and because the latter causes me to mentally read out "issue id one million one hundred forty two thousand ..." which feels not quite right.

rooterkyberian · 2018-10-04T15:03:52Z

I think in corner cases (this is not happening too often, or I'm not aware of something?) such as actually_a_date = 2018_05_01 we can just disable formatting for particular line.

I would argue removal of numeric-underscore-normalization is the best course of action. It still is a very fresh Python feature, so I assume we have yet to explore how useful (or troublesome) it is and thus we shouldn't force it on anyone. The inability to search on number is especially worrying to me.
And to make a case against that it is given that black has to have some opinion, I would like to say, that even with things like line length, the issue always was not if we should limit it, just a particular number. Here we do not even know if we want to use underscores at all anywhere in the code.

Anyway, even as an "opponent" of this normalization, if it stays I still think it shouldn't be optional because of the reasons stated by @underyx .

rooterkyberian · 2018-10-04T15:28:00Z

Here is a link to discussion which happened when the option was first introduced: #529

pzelnip · 2018-10-10T20:42:35Z

I like fix option 2 - "Remove the whole feature of numeric-underscore-normalization; if it's not a universally enforceable standard, maybe it has no right to be a part of black." There are clear examples documented above where numeric underscores not only don't make sense but obfuscate the intent of the code.

The first time Black reformatted some code with them I pushed the PR & saw the underscores in the numeric literals and was confused where they'd come from. It wasn't until a couple days later I had even heard that underscores in numeric literals were allowed now in Python 3.6 so at first I thought it was a syntax error.

I see them as being much like automatic string concatenation, these two lines are equivalent:

x = "this is a really really  really realllllllllly looooooooooooooooooooooooooooooooooooooooooong line"
x = (
    "this is a really really  really realllllllllly "
    "looooooooooooooooooooooooooooooooooooooooooong line"
)

If I add the parenthesis & split a string literal into multiple lines Black is absolutely permitted to rearrange the surrounding whitespace for consistency. However, It'd be rather invasive of Black to break the literal up in the first place even though it violates the 88 char line rule.

Similarly for numeric literals, if I choose to put underscores in them then I'd expect Black to respect that & leave them alone because changing them likely changes what I'm intending to express by putting them in. If I leave underscores out, then it's probable that's because my intent is to not make the numeric literal look like a western-style value with commas every 3 digits because of what the value represents (ex: a user id, or a date, etc)

harrybiddle · 2018-10-24T09:50:30Z

I also agree with option 2: "Remove the whole feature of numeric-underscore-normalization; if it's not a universally enforceable standard, maybe it has no right to be a part of black."

While the philosophy of Black is to be uncompromising on the things it decides to enforce, that doesn't mean that it has to enforce all aspects of your code. With most decisions it chooses to make are neither here nor there, the feature as it stands can make it much harder for teams to read the code: either because of the poor date format support (and what about telephone numbers literals??) or because of localisation. I can see that being a barrier to adoption for Black.

ghost · 2018-10-26T01:18:32Z

I mentioned earlier, but not in this thread: I consider altering numeric literals to be the same as altering a quoted string. Why would you do it? Even if one could determine the correct context (date? locale?) which one can't, one can't determine the human intent.

coxley · 2018-10-26T21:23:28Z

First time I've noticed this but it breaks readability and usability of UNIX timestamps. Makes it impossible to double-check timestamps from code

$ date -d @1_534_291_200
date: invalid date ‘@1_534_291_200’

monocongo · 2018-11-11T22:54:03Z

Is this option still available? I don't see it in the command line options section of the docs. I'm getting used to using black for formatting and I have gone all in on one project, but I just noticed the underscores in the numbers and wow, that's weird. It's the first time I've seen this, and I'd like to omit this formatting if possible. My understanding is that you can configure black's formatting behavior by adding entries into the black section of the pyproject.toml file, and I have added skip-numeric-underscore-normalization = true. Is this all that's required, once this option is available?

rooterkyberian · 2018-11-11T23:26:41Z

@monocongo
Yes it is available, see the readme or black --help.
When enabled it ignores whatever formatting for numbers is currently in the files. Hence, once formatted by whatever tool, after enabling --skip-numeric-underscore-normalization black won't make any alternation to the files anymore in this regard.
If you want to remove all underscores from your number literals from your code, you could try modifying black by adding .replace('_', '') at the end of https://github.com/ambv/black/blob/5c2dd96a69a935cf45acbdf2ffabbd39b27d38fa/black.py#L2603 . I haven't test it, but so try it at your own risk.

If you want to contribute some arguments for and against numeric underscore that are more substantial than "that's weird" (that is, e.g. why you think its weird, other than "not what I'm used to") then please do so here, but otherwise a new issue will be more appropriative as this one seems to be dedicated to discussion if this option is worth keeping or should it be removed - and if that happens, what a strict default should be.
black idea is to be opinionated tool, feeling of "weirdness" is unavoidable, for at least some of the users. What we can do about that, is ensure - by a rational dialog - that the formatting decisions driving black behavior prove a right choice for the most use cases.

In this particular issue there are some voices that this particular feature has does not deliver a good results for couple of common cases (searching for a number in code base, unix timestamp formatting) and less common ones (AFAIK - date as a number; some kind of ID as a number).
There is also whole issue of black idea being "no customization" and then, options meant for "easing old project conversion" being used in brand new projects.

monocongo · 2018-11-11T23:54:09Z

It's weird because it detracts from readability, IMHO. It breaks things like when I have numbers that are meaningful such as 20180711 (for July 11th, 2018), it turns that into 201_807_11 (and I'd expect the grouping to start from the other end, i.e. 20_180_711, since that's more akin to 20,180,711). I can see how this may provide benefit in some cases, but in my opinion, it's taking things a bit too far. (Of course you can get used to anything with enough exposure, it just took me aback when I first saw it, I wasn't aware of PEP515.) It's not my call, I'm just glad that this is one of the few options where users have discretion, hence my comment in support of keeping this option available and for confirmation that it's still in play.

JelleZijlstra · 2018-11-12T04:39:56Z

Black does format 20180711 as 20_180_711; if you see something else, please report this as a bug.

monocongo · 2018-11-12T16:03:39Z

Thanks for the follow up/clarification. You're right, I was mistaken. The grouping by three goes in the opposite direction to the right of the decimal point, and logically so. I shouldn't have commented about the behavior from a (mistaken) extrapolation.

dstufft · 2018-11-19T14:27:14Z

For the record, I also agree this rule seems weird and wrong to apply globally. Numeric underscores generally only apply in cases where the number is a "natural" number, but that is not the sole purpose of all numbers in Python nor are the cases where it's not the sole purpose, obscure.

rooterkyberian · 2018-11-19T19:08:22Z

nor are the cases where it's not the sole purpose, obscure.

It would be love to see a source for this statement.
How many hard coded number literals that are not "natural" numbers are in the code base? We have listed issue numbers (which personally I have never seen as python literals), unix timestamps (but how often we hardcode these?), and dates as a number (do we have some examples where this used in the wild?).

Links to some opensource projects which have examples of these would be great help, at least then we can try to estimate how common these practices are.

dstufft · 2018-11-19T19:30:36Z

At the doctor so no links. But the place I’ve personally ran into this has been primarily in tests, where I have specific ID, dates, time stamps, etc written as hardcoded literals. I use hardcoded literals in tests way more often than anywhere else tbh.

…

Sent from my iPhone

On Nov 19, 2018, at 2:08 PM, rooterkyberian ***@***.***> wrote: nor are the cases where it's not the sole purpose, obscure. It would be love to see a source for this statement. How many hard coded number literals that are not "natural" numbers are in the code base? We have listed issue numbers (which personally I have never seen as python literals), unix timestamps (but how often we hardcode these?), and dates as a number (do we have some examples where this used in the wild?). Links to some opensource projects which have examples of these would be great help, at least then we can try to estimate how common these practices are. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

carljm · 2018-11-19T20:29:05Z

We skip numeric normalization at Instagram. Some data analysis found that a sizable majority of the numeric constants in our codebase were object ids for which greppability far outweighs formatting as a natural number. We also found other cases such as phone numbers and timestamps.

My opinion is that numeric formatting is out of scope for a formatter, because it is impossible to do correctly without understanding the meaning of the numeric literal. Asserting that all numeric literals "should" represent natural numbers is an opinionated bridge too far; that's no longer an issue of code formatting at all.

underyx · 2018-11-19T22:33:49Z

Great points all around. I think it's safe to say we have overwhelming support of removal of the feature, so I have updated my issue title and description to reflect this.

y9c · 2018-11-20T14:41:21Z

I don't understand the logic behind the feature.

Number without underscore will not be formatted by black, while number with pre-defined underscore will be formatted.

num = 100000000 ----black---> num=100000000
package_version = 2018_11 ----black---> package_version = 201_811

What I expected was the opposite situation.

rooterkyberian · 2018-11-20T14:56:53Z

@yech1990 what you have mentioned here is due py36 support detection. If you had both lines in the same file, both of them would get formatted with underscores.
You can force black to assume py36 features are available by adding --py36 as a command line argument. This behavior is not specific to this feature - it also used with things like inserting trailing commas after **kwargs.

y9c · 2018-11-20T15:14:49Z

Both are in the same file.

I use python 3.7

@rooterkyberian

The option used to preserve underscores in numeric literals has been removed from Black. Black no longer normalizes numeric literals to include _ separators. See: psf/black#549 psf/black#696 Signed-off-by: Enrico Usai <usai@amazon.com>

See psf/black#549

underyx changed the title ~~Remove --skip-numeric-underscore-normalization~~ Remove the --skip-numeric-underscore-normalization option Oct 3, 2018

underyx changed the title ~~Remove the --skip-numeric-underscore-normalization option~~ Do something about the --skip-numeric-underscore-normalization option Oct 3, 2018

cas-- mentioned this issue Nov 10, 2018

Add --single-quote option #594

Closed

underyx changed the title ~~Do something about the --skip-numeric-underscore-normalization option~~ Remove numeric underscore normalization Nov 19, 2018

zsol mentioned this issue Feb 6, 2019

Remove numeric underscore normalization #696

Merged

JelleZijlstra closed this as completed in #696 Feb 7, 2019

enrico-usai mentioned this issue Mar 15, 2019

Remove skip-numeric-underscore-normalization option from black aws/aws-parallelcluster#932

Merged

laggron42 added a commit to laggron42/Laggrons-Dumb-Cogs that referenced this issue Mar 15, 2019

Remove --skip-numeric-underscore-normalization flag from black commands

7afaa3a

See psf/black#549

underyx mentioned this issue Mar 19, 2019

Mention in docs that int formatting only adds underscores above 10000 #546

Closed

evelyn9191 mentioned this issue Jul 12, 2019

Fix: Extract XML and CSV correctly rossumai/elisctl#73

Merged

laggron42 added a commit to laggron42/Laggrons-Dumb-Cogs that referenced this issue May 22, 2020

Remove --skip-numeric-underscore-normalization flag from black commands

327f3eb

See psf/black#549

topecongiro mentioned this issue May 24, 2020

Formatting big numbers with thousand separator rust-lang/rustfmt#3949

Closed

ichard26 mentioned this issue Dec 29, 2020

pyserial 115200 converts into 115_200 #1890

Closed

JelleZijlstra mentioned this issue Mar 26, 2022

Remove underscores in numeric literals #2957

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove numeric underscore normalization #549

Remove numeric underscore normalization #549

underyx commented Oct 3, 2018 •

edited

Loading

Configuring black

`--skip-numeric-underscore-normalization`

`--skip-string-normalization`

underyx commented Oct 3, 2018 •

edited

Loading

glyph commented Oct 3, 2018

csvoss commented Oct 3, 2018

rooterkyberian commented Oct 4, 2018

rooterkyberian commented Oct 4, 2018

pzelnip commented Oct 10, 2018

harrybiddle commented Oct 24, 2018

ghost commented Oct 26, 2018

coxley commented Oct 26, 2018

monocongo commented Nov 11, 2018 •

edited

Loading

rooterkyberian commented Nov 11, 2018 •

edited

Loading

monocongo commented Nov 11, 2018

JelleZijlstra commented Nov 12, 2018

monocongo commented Nov 12, 2018

dstufft commented Nov 19, 2018

rooterkyberian commented Nov 19, 2018

dstufft commented Nov 19, 2018 via email

carljm commented Nov 19, 2018

underyx commented Nov 19, 2018

y9c commented Nov 20, 2018

rooterkyberian commented Nov 20, 2018 •

edited

Loading

y9c commented Nov 20, 2018 •

edited

Loading

Remove numeric underscore normalization #549

Remove numeric underscore normalization #549

Comments

underyx commented Oct 3, 2018 • edited Loading

Localized code formatting

The black philosophy

Developers are a lost cause

Possible fixes

Configuring black

--skip-numeric-underscore-normalization

--skip-string-normalization

underyx commented Oct 3, 2018 • edited Loading

glyph commented Oct 3, 2018

csvoss commented Oct 3, 2018

rooterkyberian commented Oct 4, 2018

rooterkyberian commented Oct 4, 2018

pzelnip commented Oct 10, 2018

harrybiddle commented Oct 24, 2018

ghost commented Oct 26, 2018

coxley commented Oct 26, 2018

monocongo commented Nov 11, 2018 • edited Loading

rooterkyberian commented Nov 11, 2018 • edited Loading

monocongo commented Nov 11, 2018

JelleZijlstra commented Nov 12, 2018

monocongo commented Nov 12, 2018

dstufft commented Nov 19, 2018

rooterkyberian commented Nov 19, 2018

dstufft commented Nov 19, 2018 via email

carljm commented Nov 19, 2018

underyx commented Nov 19, 2018

y9c commented Nov 20, 2018

rooterkyberian commented Nov 20, 2018 • edited Loading

y9c commented Nov 20, 2018 • edited Loading

underyx commented Oct 3, 2018 •

edited

Loading

`--skip-numeric-underscore-normalization`

`--skip-string-normalization`

underyx commented Oct 3, 2018 •

edited

Loading

monocongo commented Nov 11, 2018 •

edited

Loading

rooterkyberian commented Nov 11, 2018 •

edited

Loading

rooterkyberian commented Nov 20, 2018 •

edited

Loading

y9c commented Nov 20, 2018 •

edited

Loading