-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report unicode #7609
Report unicode #7609
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I'm in favor of using the box characters. But I am against adding a flag for this. We should just switch to the Unicode without a flag. And if for some reason someone is using a non-Unicode terminal, we can either let TerminalWriter
handle it (will print ugly Unicode escapes), or have an automatic ASCII fallback if really necessary.
src/_pytest/terminal.py
Outdated
"--unicode", | ||
action="store_true", | ||
default=False, | ||
help="Use unicode characters for horizontal rules.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In prose Unicode should start with a capital letter:
help="Use unicode characters for horizontal rules.", | |
help="Use Unicode characters for horizontal rules.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So should I update the help message, or remove the option entirely?
src/_pytest/terminal.py
Outdated
**markup: bool | ||
) -> None: | ||
self.write_sep( | ||
"\u2500" if self.config.option.unicode else "-", title, fullwidth, **markup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The escaping is not needed here, I suggest using the character directly (same for the others):
"\u2500" if self.config.option.unicode else "-", title, fullwidth, **markup | |
"─" if self.config.option.unicode else "-", title, fullwidth, **markup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But when we're using the character directly in the file, we have to make sure, that the file uses the right encoding (i.e. UTF-8). But it seems that we've already got unicode characters in the pytest sourcecode:
>>> import pathlib
>>> max(max(pys.read_text(encoding="utf-8")) for pys in pathlib.Path(".").glob("*/*.py"))
'😊'
However, I find ─
to be visible nearly indistinguishable from -
(at least in non-proportional fonts), so the source code might be misleading. Using the escaped character clearly shows whats going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And of course be could always use an \N
escape, i.e.:
"\N{BOX DRAWINGS LIGHT HORIZONTAL}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file encoding is UTF-8 so it's not a problem.
However, I find ─ to be visible nearly indistinguishable from -
I'd like to have the symbol because then I can see what it is. Maybe leave the symbol, but add a clarifying comment with its Unicode name?
src/_pytest/terminal.py
Outdated
**markup: bool | ||
) -> None: | ||
self.write_sep( | ||
"\u2049" if self.config.option.unicode else "!", title, fullwidth, **markup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this character "⁉", I think this one can be kept as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've updated the pull request to use "!" in both the ASCII and Unicode case.
IMHO we should have a clean/proper ASCII fallback. As shown in #7475, there is still a variety of conditions where it's not possible to print Unicode to the terminal. |
The simplest solution for that would be to install a codec error handling callback to translate fancy Unicode characters to simple ASCII characters, i.e. something like: import codecs
replacements = [
("\u2500", "-"),
("\u2550", "="),
("\u2581", "_"),
]
def simplify(exc):
if not isinstance(exc, UnicodeEncodeError):
raise TypeError(f"can't handle {exc}")
object = exc.object[exc.start:exc.end]
for (u, a) in replacements:
object = object.replace(u, a)
object = object.encode("ascii", "backslashreplace").decode("ascii")
return (object, exc.end)
codecs.register_error("simplify", simplify) Then we could use >>> print("\u2500\u2550\u2581 foo".encode("ascii", "simplify"))
b'-=_ foo' |
It might be simpler for |
But which unicode character should that be? It might be one that the target encoding can encode, and then we'd use the target encoding, but later might encounter unencodable characters and fail anyway. Or it might be one that we can't encode and then we'de fall back to ASCII although the encoding might be able to encode some non-ASCII characters. Or would you decide whether to fallback to ASCII on every single call?
But it's totally transparent (except that you have to use the error handler name). And when you have a properly configured unicode terminal (which should be almost always these days), it's 0% performance overhead (the error handler doesn't even have to be looked up). |
My view is that pytest should not care about non-Unicode terminals except for not crashing. Though we might want to wait until we drop Python 3.5 since IIUC in this version on Windows the stdout encoding is still broken. Currently the "not crashing" part is done by |
Okay, that got me convinced!
I will complain then. I do use pytest with Python 3.5 on Windows (and also use it on GitHub Actions with Python 3.7 where there also is not Unicode output, see #7475). I do consider a |
Maybe we can start discussing removing support for Python 3.5. According to here, it reaches end-of-life at 2020-09-13 which is a few weeks from now.
Did we confirm that GitHub Actions stdout really doesn't support Unicode? It sounds very unlikely to me.
The issue there was a crash, which I agree we shouldn't do. The question is if anyone would complain if it's not a crash, just ugly output. If it's just broken output on CI systems then maybe people won't even notice. I don't think there are many developers left with terminals which only support ASCII, but I may be wrong on that. |
This is very common in schools - I know that it would affect me and many of my students - and is part of the general pattern where UX problems are disproportionately bad for new people who don't know how to handle or avoid them 😕 |
OK, then I'll update the pull request to use the error callback and remove the option.
This will be replaced by the proper ASCII characters by the callback. And when you're using e.g. an ISO-8859-1 terminal you will still get the characters from the Latin-1 range (U+0080 - U+00FF). If you only have an ASCII terminal those characters would be replace by escape sequences instead. This is different from the current behaviour which in case of an encoding error switches to escape sequences for all non-ASCII characters, even if the terminal would be able to encode some of them). |
…rs instead. Reencode failing output via a codec error callback to support terminals that are not unicode capable. Update tests so that the handle both unicode and ASCII output.
OK, I've updated the patch to remove the Note that changing the There's a test failing in
|
The |
What sort of machines or configuration issues did you run into? Maybe we can mitigate them, or give clear guidance about them. The
I think applying the replacements in |
Hi everyone, sorry for joining late in the discussion. I definitely believe we need to improve the pytest reporting, it has grown organically over the years and feels a bit dated sometimes. However, I'm 👎 on this change for the following reasons:
In summary, the change in visuals is pretty minimal compared to the effort, possible breakages for users, and increased complexity. I wouldn't really like to make a release and having a ton of breakages based on the minimal improvement that this provides, I don't think it is worth the trade-off. I would much rather we keep things for simple now and start to think about delegating rich output/unicode handling to a third party library, like |
have you seen pytest-sugar? https://pypi.org/project/pytest-sugar/ |
Sorry @graingert who's that addressed to? |
In general for the thread, a plugin that uses non-ascii drawing characters |
This doesn't change any pytest configuration, so I'm not sure why the patch should break them. Or do you mean "system configuration"? For non-unicode capable systems, the code now escapes unencodable characters and it did before (now it uses a codec error callback, before it was using the unicode-ecape codec). But after the patch, pytest would print non ASCII characters much more often, so indeed the chance of breaking on non-unicode capable systems is higher than with ASCII only output.
IMHO the changes don't make the code more brittle. But of course the safest approach would be to set the new codec error handler on import codecs
def simplify(exc):
if not isinstance(exc, UnicodeEncodeError):
raise TypeError("can't handle {}".format(exc))
replacements = [
("\u2500", "-"),
("\u2550", "="),
("\u2581", "_"),
]
object = exc.object[exc.start : exc.end]
for (u, a) in replacements:
object = object.replace(u, a)
object = object.encode("ascii", "backslashreplace").decode("ascii")
return (object, exc.end)
codecs.register_error("simplify", simplify) And then we can do import sys
sys.stdout.reconfigure(errors="simplify")
sys.stderr.reconfigure(errors="simplify") Since this requires Python 3.7 we can't do that yet, but without that we're back to the status quo (or at least the version with the codec error callback).
True.
Another approach would be to keep using the ASCII characters but move them into class attributes of the
If that approach seems reasonable to you, I can update the pull request to implement that.
OK, that is a completely different level of effort/benefit and probably requires major refactoring of |
Also note that
Which means that |
Hey @doerwalter,
Sorry, yest that's what I've meant.
Technically yes, as far as our testing can cover, but in our experience we can't cover all possible cases in our suite.
I agree, and for me the chance of this breaking users vs the small improvement in the UX is not worth the trade-off. Users run pytest in many many environments: local terminals, CI, docker containers, embeded systems, etc, I would be surprised that this won't break something.
Oh I agree, but I suspect we would have major gains too, including better graphics plus getting rid of a lot of custom code.
I'm sure In summary my main point is that we are baking a custom solution, which always has a maintenance cost, for something that has very minimal gains; I don't think it is worth the potential breakage that this brings. Having said all that, I definitely appreciate the effort you've shown here in implementing this and accommodating reviewers' concerns. 👍 |
Move default rule characters into class attributes of TerminalReporter.
OK, I've updated to pull request again. I've reverted the changes to the default rule characters, and the use of the codec error callback (and the changes to the tests). I've moved the characters used for printing rules into class attributes of
Has the pull request any chance of getting accepted in this form? |
All great points, thanks!
I'm 👍. While one could argue that Again thanks for the patience on dealing with this PR; I know it can be frustrating when there are many contrary opinions when reviewing a PR. |
@nicoddemus IMHO, there are three different approaches to unicode output:
I'm definitely tending more towards 1. than towards 3., especially for a test runner. However, we already do have non-ASCII output (the ± comes to mind, from #1441). This turned out to be a problem on Python 2 (#2111) but from what I remember, hasn't really bitten anyone since it was introduced in 2016. Then again, ± is also part of latin1, so it might be less problematic than the box drawing characters originally part of this PR. Additionally, this would probably also break |
This might not be possible 100%, as there might by Unicode characters in the users test file.
But degrading gracefully would IMHO be better than crashing and burning.
We can of course always wait until pytest requires at least Python 3.7 and then use But the current version of the pull request doesn't introduceany Unicode characters anyway anymore. |
Hey @The-Compiler and @doerwalter,
Mostly yes, but an important part for me is that it was not worth the small improvement vs the chance of breakage. But you bring another point about pytester, which would require further changes.
Indeed, I think the current version allows plugins to extend it and improves the code further, because now we centralize the drawing of box characters in well named functions/variables. 👍 |
Hi @doerwalter, The intention of this PR has shifted from using Unicode characters for the hrules in pytest to adding new APIs to When we add new public APIs (which can't be changed without a lot of pain), we need to carefully consider the design to make sure it's the best we can come up with. Since this PR has a lot of comments which discuss something irrelevant to the current state of the PR, would you mind re-submitting this in a new PR where we can have a more targeted discussion? Thanks! |
Done: #7647 |
This pull request implements feature request #7608: "Use unicode box drawing characters for horizontal rules in pytest output".
When the option
--unicode
is specified, Unicode box drawing characters are used from printing horizontal rules (instead of the ASCII characters-
,=
,_
and?
).This closes #7608