-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compile time textwrap.dedent() equivalent for str or bytes literals #81087
Comments
A Python pattern in code is to keep everything indented to look pretty while, yet when the triple quoted multiline string in question needs to not have leading whitespace, calling textwrap.dedent("""long multiline constant""") is a common pattern. rather than doing this computation at runtime, this is something that'd make sense to do at compilation time. A natural suggestion for this would be a new letter prefix for multiline string literals that triggers this. Probably not worth "wasting" a letter on this, so I'll understand if we reject the idea, but it'd be nice to have rather than importing textwrap and calling it all over the place just for this purpose. There are many workarounds but an actual syntax would enable writing code that looked like this: class Castle:
def __init__(self, name, lyrics=None):
if not lyrics:
lyrics = df"""\
We're knights of the round table
We dance whene'er we're able
We do routines and scenes
With footwork impeccable.
We dine well here in {name}
We eat ham and jam and spam a lot.
"""
self._name = name
self._lyrics = lyrics Without generating a larger temporary always in memory string literal in the code object that gets converted at runtime to the desired dedented form via a textwrap.dedent() call. I chose "d" as the the letter to mean dedent. I don't have a strong preference if we ever do make this a feature. |
I agree that this is a recurring need and would be nice to have. |
+1 There's a long thread on something similar here: https://mail.python.org/pipermail/python-ideas/2018-March/049564.html Carrying over into the following month: https://mail.python.org/pipermail/python-ideas/2018-April/049582.html Here's an even older thread: https://mail.python.org/pipermail/python-ideas/2010-November/008589.html In the more recent thread, I suggested that we give strings a dedent method. When called on a literal, the keyhole optimizer may do the dedent at compile time. Whether it does or not is a "quality of implementation" factor. The idea is to avoid the combinational explosion of yet another string prefix:
while still making string dedents easily discoverable, and with a sufficiently good interpreter, string literals will be dedented at compile time avoiding any runtime cost: https://mail.python.org/pipermail/python-ideas/2018-March/049578.html |
Oh good, I thought this had come up before. Your method idea that could be optimized on literals makes a lot of sense, and is notably more readable than yet another letter prefix. |
Hi, I have been looking to get more acquainted with the peephole optimizer. Is it okay if I work on this? |
I'd say go for it. We can't guarantee we'll accept the feature yet, but I think the .dedent() method with an optimization pass approach is worthwhile making a proof of concept of regardless. |
For the record, I just came across this proposed feature for Java: https://openjdk.java.net/jeps/8222530
It seems to be similar to Python triple-quoted strings except that the The JEP proposal says:
which matches my own experience: *most* but not all of my indented Note that there are a couple of major difference between the JEP
The JEP also mentions considering multi-line string literals as Swift https://stackoverflow.com/questions/29483365/what-is-the-syntax-for-a-multiline-string-literal I mention these for completeness, not to suggest them as alternatives. |
Thanks, it's actually good to see this being a feature accepted in other languages. |
Hi @steven.daprano, @gregory.p.smith. I added the first version of my PR for review. One issue with it is that in: def f():
return " foo".dedent() f will have both " foo" and "foo" in its constants even if the first is not used anymore. Removing it requires looping over the code once more while marking the constants seen in a set and I was not sure if this was ok. |
Perform the optimization at the AST level, not in the peepholer. |
That seems to be what happens with other folded constants: >>> def f():
... return 99.0 + 0.9
...
>>> f.__code__.co_consts
(None, 99.0, 0.9, 99.9) so I guess that this is okay for a first draft. One difference is that (But not me, sorry, I don't have enough C.)
That should probably be a new issue. |
Serhiy's message crossed with mine -- you should probably listen to |
Thanks, this makes more sense. |
And mine crossed with yours, sorry. I will update my PR shortly. |
Thanks @serhiy.storchaka, it's far easier to do here. I pushed the patch to the attached PR. Is there a reason the other optimisations in the Peephole optimizer are not done in the AST? |
The optimization that can be done in the AST is done in the AST. |
While the string method works pretty well, I do not think this is the best way. If 98% of multiline string will need deindenting, it is better to do it by default. For those 2% that do not need deintentation, it can be prohibited by adding the backslash followed by a newline at first position (except the start of the string). For example: smile = '''\
XX
XX X
X
XXX X
X
XX X
XX
\\
''' Yes, this is breaking change. But we have import from 3.9. Implement |
Regardless of what we do for literals, a dedent() method will help for
And that gives us plenty of time to decide whether or not making it the |
Agreed, I'm in favor of going forward with this .dedent() optimization approach today. If we were to attempt a default indented multi-line str and bytes literal behavior change in the future (a much harder decision to make as it is a breaking change), that is its own issue and probably PEP worthy. |
I've tried a bit PR 13455, I find this way nicer than textwrap.dedent(...), In the following it is clear that dedent is after formatting:
It might be unclear for the following especially if
Could it be made clearer with the peephole optimiser (and tested, I don't believe it is now), that dedent applies after-formatting ? Alternative modifications/suggestions/notes:
|
Oh, please, please, please PLEASE let's not over-sell this! There is no Such constant-folding optimizations can only occur with literals, since f"{'abc' if random.random() \> 0.5 else 'xyz'}" So we don't know how many spaces each line begins with until after the f"""{m:5d}
{n:5d}""" Unless we over-sell the keyhole optimization part, there shouldn't be x, X = 'spam', 'eggs'
f"{x}".upper()
\# returns 'SPAM' not 'eggs'
We should certainly make that clear that Personally, I think we should soft-sell on the compile-time optimization
I don't see how it will make any difference in the common case. And the
I don't think so, but eventually it might. |
Sorry didn't wanted to give you a heart attack. The optimisation has been mentioned, and you never know what people get excited on.
Well, in here we might get that, but I kind of want to see how this is taught or explain, what I want to avoid is tutorial or examples saying that
Ok, thanks. Again just being cautious, and I see this is targeted 3.9 so plenty of time. |
Can we dedent docstring too? Is there any string like inspect.cleandoc(s) != inspect.cleandoc(s.dedent())? |
I think dedenting docstring content by default would be a great thing to do. But that's a separate issue, it isn't quite the same as .dedent() due to the first line. I filed https://bugs.python.org/issue37102 to track that. |
We should consider dedicated syntax for compile-time dedenting:
|
Another option not using a new letter: A triple-backtick token. def foo():
value =
|
I think it would be better to use use backtick quotes for f-strings instead of the f prefix. This would stress the special nature of f-strings (they are not literals, but expressions). But there was strong opposition to using backticks anywhere in Python syntax. |
A related issue(which I believe has no topic in this forum yet) is substituting an expression that results in a multiline string into a multiline f-string while matching its indentation. https://stackoverflow.com/a/57189263/2976410 I.e. ideally we would have: def make_g_code():
nl='\n'
return d"""\
def g():
{nl.join(something(i) for i in range(n))}
return something_else
""" This still has issues. Newline needs to be put into a variable, for instance. In the case of using this template for languages, great many use braces for delimiting blocks and those need to be escaped inside f-strings. An implementation that works with spaces only (does not suit my case where mixed indentation is possible) is here: http://code.activestate.com/recipes/578835-string-templates-with-adaptive-indenting/ Please let me know if this is the wrong place to comment on this issue. |
(assigning to me as I want to help remi.lapeyre's .dedent() method PR move forward) |
I am opposed to more prefix characters. Are any string literal optimizations done now at compile time, like for 'abc'.upper? It does not exactly make sense to me to do just .dedent. But without the optimization, adding str.dedent without removing textwrap.dedent does not add a whole lot. (But it does add something, so I am mildly positive on the idea.) |
I read to the end of the patch and found astfold_dedent. A small point in favor of textwrap.dedent is immediately announcing that the string will be dedented. |
A string prefix would be a large language change that would need to go through Python-Ideas, a PEP and Steering Council approval. Let's not go there :-) A new string method is a comparatively small new feature that probably won't need a PEP or Steering Council approval. A compile-time optimization is an implementation feature, not a language feature. Whether devs want to follow up with more string optimizations like 'spam'.upper() is entirely up to them. Backwards compatibility implies that textwrap.dedent is not going away any time soon, but it should probably become a thin wrapper around str.dedent at some point. |
I expect several phases here: (1) add a .dedent() method to str (and bytes?) - behaviors to consider mirroring are textwrap.dedent() and inspect.cleandoc(). Given their utility and similarities, it makes sense to offer both behaviors; behavior could be selected by a kwarg passed to the method. https://docs.python.org/3/library/textwrap.html#textwrap.dedent https://docs.python.org/3/library/inspect.html#inspect.cleandoc (2a) Ponder the d" prefix - but in general I expect sentiment to be against yet another letter prefix. They aren't pretty. This would need a PEP. Someone would need to champion it. (2b) Ponder making cleandoc dedenting automatic for docstrings. This would be enabled on a per-file basis via a (3) Optimizations when .dedent() is called on a constant? Nice to have, But I suggest we land (1) first as its own base implementation PR. Then consider the follow-ons in parallel. I believe the current patch contains (1)+(3) right now. If so we should simplify it to (1) and make (3) am immediate followup as saving the runtime cost and data space is worthwhile. Ultimate end state: probably 1+2b+3, but even 1+3 or 1+2b is a nice win. |
I don't think we need two algorithms here. I'm +1 to add str.dedent() which mirroring only inspect.cleandoc(). |
Multiline string literals were added recently in Java (https://openjdk.java.net/jeps/378). The semantic is similar to Julia: autodedent and ignoring the first blank line. Multiline string literals have similar semantic in Swift (https://docs.swift.org/swift-book/LanguageGuide/StringsAndCharacters.html). |
I withdraw this. If we add str.dedent(), it must not be optimized for triple-quote literal. Auto dedenting is very nice to have. It can be different from inspect.cleandoc(). We may able to cleandoc() automatically, even without We already have a separate issue for docstring. And auto dedenting will needs PEP. How about focus on str.dedent() and change the issue title? |
One benefit of using a compile time feature over a runtime method is that the former allows for more predictable dedenting by first dedenting and only then interpolating variables. For example, the following code does not dedent the test string at all: import textwrap
foo = "\n".join(["aaa", "bbb"])
test = textwrap.dedent(
f"""
block xxx:
{foo}
"""
) It would be much nicer if we had syntactical dedents based on the leftmost non-whitespace column in the string, as supported by Nix language, for example. test = (
df"""
block xxx:
{textwrap.indent(foo, ' '*4)}
"""
) It would be even nicer if the interpolated strings could be indented based on the column the interpolation occurs in but that can at least be done at runtime with only minor inconvenience. |
This feature (or one very like it) has been requested again, this time with an "i" prefix: https://discuss.python.org/t/indented-multi-line-string-literals/9846/1 |
What do folks think about taking an approach similar to Scala's stripMargin: >>> my_str = """
... |Foo
... | Bar
... | Baz
... """.stripmargin()
Foo
Bar
Baz Seems it could be achieved with only the addition of a new method to |
How is stripmargin different from dedent?
|
@stevendaprano my apologies, I did not realize |
The proposal is to make dedent a string method, and then allow
interpreters to optimise it into a compile-time operation (perhaps in
the keyhole optimiser), *not* to use a function like this:
```
def func(x):
textwrap.dedent("""blah blah blah
multiple lines
"""
)
```
That won't work because it won't be seen as a docstring.
This issue hasn't been touched for 18 months. It looks like this issue
is languishing, possibly over the question whether the method
should implement textwrap.dedent or inspect.cleandoc.
|
That question is valid but seems a mere implementation detail. A PR implementing the optimization would help motivate making that decision. I'd still love it to be automatic for docstrings as it'd save memory for everyone in the world, but that can be considered a separate follow-up feature. |
cleandoc and dedent are functionally different, so it's not just an
implementation detail, its a difference of semantics.
Looking at the output of the two, I think cleandoc is the more correct
behaviour for docstrings:
```
s = """First
Second
Third
Indented fourth
Fifth
"""
inspect.cleandoc(s) # --> 'First\nSecond\nThird\n Indented fourth\nFifth.'
textwrap.dedent(s) # --> 'First\n Second\n Third\n Indented fourth\n Fifth.\n'
```
I'd still love it to be automatic for docstrings as it'd save memory for everyone in the world,
What are the consequences of just automatically dedenting docstrings?
Technically its a breaking change, but is there code or people who rely
on the current leading whitespace?
help() already reformats the docstring, as does inspect, so maybe we
should seriously consider just changing the rules for docstrings to
automatically dedent them at function build time. It would probably have
to go through a future import first.
|
This issue is premierly about general multine string dedenting.
I don't think special casing first line is not needed for general dedenting. |
On Tue, Jan 10, 2023 at 10:02:47PM -0800, Inada Naoki wrote:
> Looking at the output of the two, I think cleandoc is the more correct behaviour for docstrings:
This issue is premierly about general multine string dedenting.
But we already have a solution for that: the textwrap module.
The textwrap module is inconvenient to use with docstrings, which is
also when we want to put extra indents in the text block to make it
look good in the .py file, but we need to remove that leading
space before displaying the text.
I agree that we should *also* think about other uses of dedent, but we
should not forget docstrings. They are an important use-case, and as far
as I am concerned, they are the driving motivation for moving dedent
into the str type as a method.
|
As I wrote in this comment, new syntax can be different from I prefer having the best multiline syntax by learning from Julia, Java, Swift, etc... Please read this comment too. Compile time docstring cleandoc is tracked in #81283. |
@Carreau "Is this a supposed to deprecating textwrap.dedent?" One of the removed uses in the PR is a 16 line, about 500 char, """-quoted string literal in idlelib. At the momemt I don't really like having 'dedent' moved from the starting line to the last. (I could use inspect.cleandoc instead. Or literal concatenation as elsewhere in the file. ) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: