Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print matrices using unicode large delimiters (on demand) #18270

Closed
gagern mannequin opened this issue Apr 21, 2015 · 28 comments
Closed

Print matrices using unicode large delimiters (on demand) #18270

gagern mannequin opened this issue Apr 21, 2015 · 28 comments

Comments

@gagern
Copy link
Mannequin

gagern mannequin commented Apr 21, 2015

Unicode provides nice symbols which can be combined to form large delimiters. It would be nice if we could print matrices using these, instead of the same ASCII brackets on all the lines. So instead of

[1 2|3]
[4 5|6]
[7 8|9]

I'd like to see one of these:

⎛1 2│3⎞    ⎡1 2│3⎤
⎜4 5│6⎟ or ⎢4 5│6⎥
⎝7 8│9⎠    ⎣7 8│9⎦

Perhaps it's best to do this in small increments: start with a keyword argument to the str method of matrices, then later on make this the default. So the ticket here is only for on-demand support of this feature, not for its automatic use by default. For later reference, see #14733 which switched the banner to Unicode, and thereby made the choice that Sage may look broken on non-Unicode terminals.

Component: user interface

Keywords: unicode matrix

Author: Martin von Gagern

Branch/Commit: 6b6f089

Reviewer: Volker Braun

Issue created by migration from https://trac.sagemath.org/ticket/18270

@gagern gagern mannequin added this to the sage-6.7 milestone Apr 21, 2015
@gagern gagern mannequin added c: user interface labels Apr 21, 2015
@gagern
Copy link
Mannequin Author

gagern mannequin commented Apr 21, 2015

Branch: u/gagern/MatrixUnicodeDelimiters

@gagern
Copy link
Mannequin Author

gagern mannequin commented Apr 21, 2015

New commits:

e174830Trac #18270: Introduce unicode_symbols keyword argument to Matrix.str

@gagern
Copy link
Mannequin Author

gagern mannequin commented Apr 21, 2015

Commit: e174830

@gagern gagern mannequin added the s: needs review label Apr 21, 2015
@videlec
Copy link
Contributor

videlec commented Apr 21, 2015

comment:3

This banner is already a mess. If I ssh + screen + sage at my laboratory, then I got a ugly

�����������������������������������

And then I am not able to see anything else (but still inputting Sage commands work).

The new matrices look nice though. For the on-demand feature it would be nice to have a global flag allowing (or avoiding) unicode

$ sage --with-no-unicode

Vincent

@videlec
Copy link
Contributor

videlec commented Apr 21, 2015

comment:4

Why not only unicode for the argument name?

@gagern
Copy link
Mannequin Author

gagern mannequin commented Apr 21, 2015

comment:5

Replying to @videlec:

Why not only unicode for the argument name?

Because in Python 2 unicode names a type as well, and I feared that at some point we might want to convert element types to unicode, thus have to call that coercion function.

@gagern
Copy link
Mannequin Author

gagern mannequin commented Apr 22, 2015

comment:6

Replying to @videlec:

This banner is already a mess. If I ssh + screen + sage at my laboratory, …

Perhaps you should file that as a bug, so it can be addressed? Does using screen -U help? What does locale print on your client's terminal, inside the ssh and inside screen respectively?

For the on-demand feature

I meant “on-demand” as opposed to “automatic”, with the demand being expressed for each matrix that gets printed, i.e. by using the keyword argument from my commit. I believe you're talking about something far more automatic here.

it would be nice to have a global flag allowing (or avoiding) unicode

The canonical way, at least on Linux, would be to inspect LC_CTYPE facet of the current locale, and detect whether that refers to UTF8 or not. If one used this to automatically choose a sane default, then setting the LC_CTYPE environment variable manually would serve the same function as the switch you suggest:

$ LC_CTYPE=en_US      sage  # without unicode
$ LC_CTYPE=en_US.utf8 sage  # with unicode

Of course, using en_US in this example is just the common default. I guess you as well as I might be using a different setting there in practice. Adding a switch might still make sense to add visibility to this feature. But all of this should probably be discussed in a follow-up ticket once we get the change here accepted.

@videlec
Copy link
Contributor

videlec commented Apr 22, 2015

comment:7

Replying to @gagern:

Replying to @videlec:

This banner is already a mess. If I ssh + screen + sage at my laboratory, …

Perhaps you should file that as a bug, so it can be addressed? Does using screen -U help? What does locale print on your client's terminal, inside the ssh and inside screen respectively?

Indeed, screen -U solves the problem. My locale on the client is

LANG=en_GB.UTF-8
LANGUAGE=en_GB:en
LC_CTYPE="en_GB.UTF-8"
...

and on the remote is

LANG=C
LANGUAGE=fr_FR:
LC_CTYPE="C"
...

For the on-demand feature

I meant “on-demand” as opposed to “automatic”, with the demand being expressed for each matrix that gets printed, i.e. by using the keyword argument from my commit. I believe you're talking about something far more automatic here.

it would be nice to have a global flag allowing (or avoiding) unicode

The canonical way, at least on Linux, would be to inspect LC_CTYPE facet of the current locale, and detect whether that refers to UTF8 or not. If one used this to automatically choose a sane default, then setting the LC_CTYPE environment variable manually would serve the same function as the switch you suggest:

$ LC_CTYPE=en_US      sage  # without unicode
$ LC_CTYPE=en_US.utf8 sage  # with unicode

Of course, using en_US in this example is just the common default. I guess you as well as I might be using a different setting there in practice. Adding a switch might still make sense to add visibility to this feature. But all of this should probably be discussed in a follow-up ticket once we get the change here accepted.

I like your suggestion very much.

Vincent

@videlec
Copy link
Contributor

videlec commented Apr 22, 2015

comment:8

review comment:

  • I would very much prefer that you avoid replace as well as the final step which deals with the top and bottom line. You should rather define all the characters needed as variables at the begining (as you did for left_bracket, right_bracket, etc). That would help for readability and if at some point we decide that all these characters are arguments of the function.
  • why not curly bracket?

Vincent

@gagern
Copy link
Mannequin Author

gagern mannequin commented Apr 22, 2015

comment:9

Replying to @videlec:

review comment:

  • I would very much prefer that you avoid replace as well as the final step which deals with the top and bottom line. You should rather define all the characters needed as variables at the begining (as you did for left_bracket, right_bracket, etc). That would help for readability and if at some point we decide that all these characters are arguments of the function.

Indeed when I started this patch, I had a line like

 top_left_bracket, mid_left_bracket, … = u"⎡⎢⎣⎤⎥⎦"

but the long names made things very hard to read. And shorter alternatives, like tlb, make things a bit hard to understand and therefore harder to maintain. Quite the opposite of “help for readability”, in my opinion. But if you insist, I can go with this approach, using the short names.

Keep in mind that the left_bracket and right_bracket strings, which have been there before, are not only representing the brackets, but also collecting the vertical lines at the ends.

Plugging the delimiting brackets in the right rows is a non-trivial affair, and would therefore in my opinion be far harder to maintain. The possibility of an arbitrary number of horizontal lines at either end complicates things considerably. I haven't been able to come up with a reasonably simple code snippet to achieve correct symbols automatically while building these strings. So unless you absolutely insist on this point, or can give a good reason on why my approach has very undesirable reprecussions, or can suggest some formulation, I'd rather stick with my approach. Or adjust it in such a way that it assembles the matrix body without any brackets, and then adds all the bracket parts in a single loop instead of replacing existing incorrect parts.

  • why not curly bracket?

Because I don't encounter them in my day-to-day work. Can you give me a hint as to where these might occur? For an even number of rows (except two), big curly brackets would look slightly unsymmetric, but I doubt that should be a concern.

@videlec
Copy link
Contributor

videlec commented Apr 22, 2015

comment:10

Replying to @gagern:

Replying to @videlec:

review comment:

  • I would very much prefer that you avoid replace as well as the final step which deals with the top and bottom line. You should rather define all the characters needed as variables at the begining (as you did for left_bracket, right_bracket, etc). That would help for readability and if at some point we decide that all these characters are arguments of the function.

Indeed when I started this patch, I had a line like

 top_left_bracket, mid_left_bracket, … = u"⎡⎢⎣⎤⎥⎦"

but the long names made things very hard to read. And shorter alternatives, like tlb, make things a bit hard to understand and therefore harder to maintain. Quite the opposite of “help for readability”, in my opinion. But if you insist, I can go with this approach, using the short names.

If there are comments it is fine. For example

tlb = u"⎡"    # top left bracket
...

or

# we set delimiters as a string composed of 6 characters
#   - top left bracket (tlb)
#   ...
if unicode_symbols:
    delimiters = u"⎡⎢⎣⎤⎥⎦"
else:
    delimiters = "[[[]]]"
tlb, ... = delimiters
  • why not curly bracket?

Because I don't encounter them in my day-to-day work. Can you give me a hint as to where these might occur? For an even number of rows (except two), big curly brackets would look slightly unsymmetric, but I doubt that should be a concern.

By "curly", I meant only using ⎧ ⎫ ⎩ ⎭ instead of your more angular version. The term was probably wrong. See

sage: m = matrix([[2,1],[0,1]])
sage: view(m)

... no angle...

@gagern
Copy link
Mannequin Author

gagern mannequin commented Apr 22, 2015

comment:11

Replying to @videlec:

If there are comments it is fine. For example …

OK, I can do that. It would be 11 characters: the six you used, plus two for single-row matrices, plus three line-drawing characters. The latter could be a separate string if you prefer. I'd make it one string because I'm lazy, but that's a weak argument.

By "curly", I meant only using ⎧ ⎫ ⎩ ⎭ instead of your more angular version.

Ah, round parentheses. Personally, I much prefer these round ones as well. I had the impression that in the USA especially, the square brackets are somewhat more common. But that impression (based mostly on Wikipedia) may be wrong, and in any case the comparison with the LaTeX rendering is a strong motivations to make things consistent and satisfy my personal preference at the same time. We might of course let the user choose between these alternatives, by allowing a string like "round" or "square" instead of True for the keyword argument.

@vbraun
Copy link
Member

vbraun commented May 3, 2015

comment:12

I've made #18357: Unicode Art.

IMHO its

  • British: round bracket ()
  • American: parentheses ()
  • square bracket []
  • curly bracket {}
    I also prefer round for matrices. Could you also change it to unicode=<bool> instead of unicode_symbol (shorter and clearer).

@vbraun
Copy link
Member

vbraun commented May 3, 2015

comment:13

Fixed:

  • British: round bracket ()
  • American: parentheses ()
  • square bracket []
  • curly brace {}

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 4, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

e4e5f4bImproved unicode delimiters for matrices, allow choosing the shape

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 4, 2015

Changed commit from e174830 to e4e5f4b

@gagern

This comment has been minimized.

@gagern gagern mannequin added s: needs review and removed s: needs work labels May 4, 2015
@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 4, 2015

Changed commit from e4e5f4b to 206b036

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 4, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

206b036Format link to trac ticket

@vbraun
Copy link
Member

vbraun commented May 5, 2015

Reviewer: Volker Braun

@vbraun
Copy link
Member

vbraun commented May 5, 2015

comment:17

Looks great!

I would prefer unicodedata.lookup('RIGHT SQUARE BRACKET EXTENSION') over , pasting the unicode in makes it hard to review that the correct code point is used. I'll refactor that in #18357 if you don't mind.

@vbraun
Copy link
Member

vbraun commented May 5, 2015

comment:18

PDF docs don't build:

! Package inputenc Error: Unicode char \u8:⎛ not set up for use with LaTeX.

See the inputenc package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              
                                                  
l.7663 \PYG{g+go}{⎛1 2\textbar{}3⎞}
                                       
? 
! Emergency stop.
 ...                                              
                                                  
l.7663 \PYG{g+go}{⎛1 2\textbar{}3⎞}

@vbraun
Copy link
Member

vbraun commented May 5, 2015

comment:19

You just have to add a suitable DeclareUnicodeCharacter to src/doc/common/conf.py

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 5, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

6b6f089TeX rendering for unicode big delimiters

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 5, 2015

Changed commit from 206b036 to 6b6f089

@gagern
Copy link
Mannequin Author

gagern mannequin commented May 5, 2015

comment:21

This is my attempt to make things look reasonably useful, even though the vertical spacing and placement is far from optimal. But seeing how many other unicode symbols are represented with crude ASCII work-arounds, I think the amount of effort I put into this should be at least on a similar level.

I just filed #18370 about better unicode support by switching to XeTeX or luaTeX. I have no experience with these, but I've read on several occasions that they offer far superior Unicode support. In this sense, I hope that my unicode symbol declarations will be a temporary workaround, although I have no idea just how temporary.

@gagern gagern mannequin added s: needs review and removed s: needs work labels May 5, 2015
@vbraun
Copy link
Member

vbraun commented May 5, 2015

comment:22

Sounds good. The main point of running latex on the docs is to validate the markup, nobody is going to print out the pdf.

@vbraun
Copy link
Member

vbraun commented May 6, 2015

Changed branch from u/gagern/MatrixUnicodeDelimiters to 6b6f089

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants