Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set pd.options.display.max_columns=0 by default #17023

Merged
merged 21 commits into from
Mar 28, 2018
Merged

Set pd.options.display.max_columns=0 by default #17023

merged 21 commits into from
Mar 28, 2018

Conversation

cbrnr
Copy link
Contributor

@cbrnr cbrnr commented Jul 19, 2017

Update: Remove everything related to max_rows and only deal with max_columns in this PR.

Changed max_columns to 0 (automatically adapt the number of displayed columns to the actual terminal width) when run in a terminal and max_rows to 20 (because I'd like to see the "whole" data frame at a glance like in R's tibble).

@TomAugspurger
Copy link
Contributor

Could you provide some before / after screenshots? This will need some feedback from the wider community, since the visual display of DataFrames is an API grey-zone; prepare for bike-shedding 😄

We still have some situations where we can't detect the terminal width reliably. We need to make sure the output is handled as well as possible there.

I'm +1 for reducing the number of rows displayed. Typically I use 10 rows.

@TomAugspurger TomAugspurger added the Output-Formatting __repr__ of pandas objects, to_string label Jul 19, 2017
@cbrnr
Copy link
Contributor Author

cbrnr commented Jul 19, 2017

Here's the current output when printing a data frame with shape (5, 10) in a terminal with 100 characters width:

before

And here is the same data frame after the proposed change:

after

@chris-b1
Copy link
Contributor

I'm OK with this. I do think it needs a big note in the whatsnew, with instructions on how to change back (maybe a ref to IPython config too). Also looks like some tests that will need adjusted.

validator=is_instance_factory([type(None), int]))
cf.register_option('max_categories', 8, pc_max_categories_doc,
validator=is_int)
cf.register_option('max_colwidth', 50, max_colwidth_doc, validator=is_int)
cf.register_option('max_columns', 20, pc_max_cols_doc,
cf.register_option('max_columns', 0, pc_max_cols_doc,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this should be None to auto-detect (0 might do the same though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should I change 0 to None or leave it as is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick test shows that None means there is no limit (i.e. display all columns). So I guess this should remain 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, so 0 is NOT the same as None here ? that is very odd. can you show an example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I tried setting this value to None after importing pandas, i.e.

import pandas as pd
pd.options.display.max_columns = None

And this results in all columns printed out (so no ellipsis to mark skipped columns in the output). But I will try setting the value in config_init.py as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that None and 0 have different meanings. None prints all columns, whereas 0 prints columns that fit within the terminal width.

@jreback
Copy link
Contributor

jreback commented Jul 20, 2017

yeah changing the column default to auto-detect is fine. I personally use an event smaller default for max_rows, but 20 looks fine. Pls update docs, a before/after screen shot that we can include in the what's new would be good (IOW your above ones). You will have to fix some tests.

@cbrnr
Copy link
Contributor Author

cbrnr commented Jul 20, 2017

Do you really want screenshots or could we mimic the old and new behavior with markdown? If you want screenshots let me modify them so that my username doesn't show up. Regarding the tests, I'll see what I can do (I certainly didn't expect to break so many tests just by changing one value 😄).

@jreback
Copy link
Contributor

jreback commented Jul 20, 2017

@cbrnr

the issue is that we are now auto-detecting and so the actual terminal width matters. yes certainly we can 'set' it so it works and show in mark down. I think screenshots might be more clear here though.

@cbrnr
Copy link
Contributor Author

cbrnr commented Jul 20, 2017

I see, I'll provide new screenshots then. Thanks for pointing out the issue, of course this makes a huge difference!

@cbrnr
Copy link
Contributor Author

cbrnr commented Jul 20, 2017

Also, the IPython QtConsole doesn't play nicely with pd.options.display.column_width=0:

screen shot 2017-07-20 at 15 23 22

@chris-b1
Copy link
Contributor

@takluyver - assuming this hasn't changed, but do you know offhand if it's still not possible to detect terminal size running in the qtconsole? Found an older SO answer from you, thanks!
https://stackoverflow.com/questions/27813132/determining-terminal-width-in-ipython-qtconsole

@takluyver
Copy link
Contributor

No, sorry. It's a conceptual mismatch, not just a technical one. The kernel is producing output for (potentially) several frontends which may be receiving it at the moment, and for applications which may later display saved copies of that output. So questions about the shape of 'the' output area don't really make sense in the Jupyter protocol.

As I see it, the real issue is that the Qt console doesn't understand any structured way of representing a table. We turned off its HTML support because it's just too limited and tends to break richer HTML written for the notebook frontend. I have occasionally advocated for a 'simple HTML' repr option which the Qt console would display, but it's never been high priority.

In the long run, I think our plan is to make an HTML console and use QtWebkit to embed it in Qt applications. Then it should be able to display HTML tables.

@cbrnr
Copy link
Contributor Author

cbrnr commented Jul 21, 2017

Thanks @takluyver - so this isn't going to work until HTML tables are rendered (which would be awesome BTW). Is it possible to determine if Pandas is running in a real terminal or not? Could someone point me to the relevant code parts?

@takluyver
Copy link
Contributor

It's possible to distinguish terminal IPython from IPython as a kernel for a Jupyter frontend, something like this:

try:
    ip = get_ipython()
except NameError:
    ... # Not IPython
else:
    if hasattr(ip, 'kernel'):
        ... # IPython as a Jupyter kernel
    else:
        ... # IPython terminal interface

@jreback
Copy link
Contributor

jreback commented Sep 23, 2017

@cbrnr can you rebase this and compose a note for the what's new?

@cbrnr
Copy link
Contributor Author

cbrnr commented Sep 24, 2017

Sure, but many tests need to be fixed and I don't know if I have the time to do that. I guess this change should be added to the API changes section?

@jreback
Copy link
Contributor

jreback commented Sep 24, 2017

@cbrnr this would need its own sub-section in API

yes would need to fix any tests.

@cbrnr
Copy link
Contributor Author

cbrnr commented Oct 18, 2017

cf #16800 #4907

@cbrnr
Copy link
Contributor Author

cbrnr commented Nov 7, 2017

Many tests rely on calling str on a data frame with the current default max number of columns. I'm not sure this will be easy to fix. This would be easy to fix if Pandas supported a pandasrc config file as proposed in #4907.

@jreback
Copy link
Contributor

jreback commented Nov 7, 2017

Many tests rely on calling str on a data frame with the current default max number of columns. I'm not sure this will be easy to fix. This would be easy to fix if Pandas supported a pandasrc config file as proposed in #4907.

nothing to do with that issue
all options already have defaults
for testing you need to setup the specific conditions for tests
generally using pd.option_context

@cbrnr
Copy link
Contributor Author

cbrnr commented Nov 9, 2017

OK, I've fixed almost all tests. Only 2 tests still fail, but I'm not sure if these failures are related to my changes:

  • pandas/tests/tseries/test_timezones.py:1290: AssertionError
  • pandas/tests/scalar/test_timestamp.py:1110: AssertionError

Here's the complete test output:

___________________________________ TestTimestamp.test_timestamp ___________________________________
[gw1] darwin -- Python 3.6.3 /Users/clemens/anaconda/envs/pandas_dev/bin/python
self = <pandas.tests.scalar.test_timestamp.TestTimestamp object at 0x10ac29940>

    def test_timestamp(self):
        # GH#17329
        # tz-naive --> treat it as if it were UTC for purposes of timestamp()
        ts = Timestamp.now()
        uts = ts.replace(tzinfo=utc)
        assert ts.timestamp() == uts.timestamp()
    
        tsc = Timestamp('2014-10-11 11:00:01.12345678', tz='US/Central')
        utsc = tsc.tz_convert('UTC')
        # utsc is a different representation of the same time
        assert tsc.timestamp() == utsc.timestamp()
    
        if PY3:
            # should agree with datetime.timestamp method
            dt = ts.to_pydatetime()
>           assert dt.timestamp() == ts.timestamp()
E           AssertionError: assert 1510231568.085538 == 1510235168.085538
E            +  where 1510231568.085538 = <built-in method timestamp of datetime.datetime object at 0x10c197d50>()
E            +    where <built-in method timestamp of datetime.datetime object at 0x10c197d50> = datetime.datetime(2017, 11, 9, 13, 46, 8, 85538).timestamp
E            +  and   1510235168.085538 = <built-in method timestamp of Timestamp object at 0x10c16fb10>()
E            +    where <built-in method timestamp of Timestamp object at 0x10c16fb10> = Timestamp('2017-11-09 13:46:08.085538').timestamp

pandas/tests/scalar/test_timestamp.py:1110: AssertionError
________________________________ TestTimeZones.test_replace_tzinfo _________________________________
[gw1] darwin -- Python 3.6.3 /Users/clemens/anaconda/envs/pandas_dev/bin/python
self = <pandas.tests.tseries.test_timezones.TestTimeZones object at 0x10e8f2f98>

    def test_replace_tzinfo(self):
        # GH 15683
        dt = datetime(2016, 3, 27, 1)
        tzinfo = pytz.timezone('CET').localize(dt, is_dst=False).tzinfo
    
        result_dt = dt.replace(tzinfo=tzinfo)
        result_pd = Timestamp(dt).replace(tzinfo=tzinfo)
    
        if hasattr(result_dt, 'timestamp'):  # New method in Py 3.3
            assert result_dt.timestamp() == result_pd.timestamp()
        assert result_dt == result_pd
        assert result_dt == result_pd.to_pydatetime()
    
        result_dt = dt.replace(tzinfo=tzinfo).replace(tzinfo=None)
        result_pd = Timestamp(dt).replace(tzinfo=tzinfo).replace(tzinfo=None)
    
        if hasattr(result_dt, 'timestamp'):  # New method in Py 3.3
>           assert result_dt.timestamp() == result_pd.timestamp()
E           AssertionError: assert 1459036800.0 == 1459040400.0
E            +  where 1459036800.0 = <built-in method timestamp of datetime.datetime object at 0x10e8effd0>()
E            +    where <built-in method timestamp of datetime.datetime object at 0x10e8effd0> = datetime.datetime(2016, 3, 27, 1, 0).timestamp
E            +  and   1459040400.0 = <built-in method timestamp of Timestamp object at 0x10e8fcf48>()
E            +    where <built-in method timestamp of Timestamp object at 0x10e8fcf48> = Timestamp('2016-03-27 01:00:00').timestamp

pandas/tests/tseries/test_timezones.py:1290: AssertionError

Any ideas?

@jreback
Copy link
Contributor

jreback commented Nov 9, 2017

@cbrnr ignore those, see #18037

python .timestamp() uses the local timezone to convert things, needs to be put into a consistent tz so it works for everyone.

@cbrnr
Copy link
Contributor Author

cbrnr commented Nov 9, 2017

OK cool, so let's wait if CIs come back happy (except for these 2 timezone-related ones). Could you help me with the whats_new entry (because we've agreed that this should be prominently visible)?

Also, I hope that my changes to the tests are OK, I mostly set the values for max_columns and max_rows to their old defaults 20 and 60, respectively.

@jreback
Copy link
Contributor

jreback commented Nov 9, 2017

whatsnew, make a new subsection. then put a screen shot of the before and one of the after. then it should read as if you are a user wanting to know whether this change will affect you (e.g. if you use ipython, the interpreter, etc).

@jreback
Copy link
Contributor

jreback commented Nov 9, 2017

for 0.22

@cbrnr
Copy link
Contributor Author

cbrnr commented Nov 9, 2017

A new subsection under "New features"? It's not really a new feature, but it doesn't fit into the other categories either.

@jreback
Copy link
Contributor

jreback commented Nov 9, 2017

under api breaking changes

@codecov
Copy link

codecov bot commented Nov 9, 2017

Codecov Report

Merging #17023 into master will decrease coverage by 0.04%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17023      +/-   ##
==========================================
- Coverage   91.42%   91.38%   -0.05%     
==========================================
  Files         163      163              
  Lines       50068    50071       +3     
==========================================
- Hits        45776    45755      -21     
- Misses       4292     4316      +24
Flag Coverage Δ
#multiple 89.18% <66.66%> (-0.03%) ⬇️
#single 40.39% <66.66%> (-0.04%) ⬇️
Impacted Files Coverage Δ
pandas/core/config_init.py 96.09% <66.66%> (-2.26%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.38% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/core/groupby.py 92.02% <0%> (-0.02%) ⬇️
pandas/io/formats/format.py 96.01% <0%> (ø) ⬆️
pandas/core/generic.py 95.72% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8dac633...85d6225. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 9, 2017

Codecov Report

Merging #17023 into master will increase coverage by 0.01%.
The diff coverage is 73.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17023      +/-   ##
==========================================
+ Coverage   91.82%   91.84%   +0.01%     
==========================================
  Files         152      152              
  Lines       49235    49245      +10     
==========================================
+ Hits        45212    45230      +18     
+ Misses       4023     4015       -8
Flag Coverage Δ
#multiple 90.23% <73.33%> (+0.01%) ⬆️
#single 41.89% <66.66%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/io/formats/format.py 98.24% <100%> (ø) ⬆️
pandas/io/formats/terminal.py 20.98% <66.66%> (+4.54%) ⬆️
pandas/core/config_init.py 99.24% <80%> (-0.76%) ⬇️
pandas/core/arrays/categorical.py 96.19% <0%> (-0.02%) ⬇️
pandas/core/indexes/datetimes.py 95.73% <0%> (-0.01%) ⬇️
pandas/core/indexes/period.py 92.61% <0%> (ø) ⬆️
pandas/core/strings.py 98.32% <0%> (ø) ⬆️
pandas/core/frame.py 97.18% <0%> (ø) ⬆️
pandas/core/dtypes/missing.py 91.07% <0%> (ø) ⬆️
pandas/core/generic.py 95.85% <0%> (ø) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6c0c277...f795914. Read the comment docs.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Mar 27, 2018

I moved the images to _static. Do I need to change anything when I refer to them, or does

.. image:: print_df_old.png

keep working?

I think you need to add _static/ to the path (at least, that's how we do it for all other images -> yes, the path it relative to the source file, or absolute to the main source directory)

@cbrnr
Copy link
Contributor Author

cbrnr commented Mar 27, 2018

Regarding the deleted images (that I locally moved to doc/source/_static), this path is in .gitignore, which is why they are gone. How should I proceed?

@jorisvandenbossche
Copy link
Member

@cbrnr you need to add them by force (git add --force) to overwrite this ignore file (we ignore it because sphinx adds more images there that should be ignored)

@jorisvandenbossche
Copy link
Member

we ignore it because sphinx adds more images there that should be ignored

We could also decide to move our actual images somewhere else, to not have this confusion, but that's for another PR.

@cbrnr
Copy link
Contributor Author

cbrnr commented Mar 27, 2018

Got it, the images are back.

@cbrnr
Copy link
Contributor Author

cbrnr commented Mar 27, 2018

What do you mean with adding _static/ to the path? Do I need to modify the link I use in the .rst file?

@jorisvandenbossche
Copy link
Member

What do you mean with adding _static/ to the path? Do I need to modify the link I use in the .rst file?

Yes. You can always check if the images are included in the output with python doc/make.py --single whatsnew

@cbrnr
Copy link
Contributor Author

cbrnr commented Mar 27, 2018

Nice, thanks! Images are now correctly embedded.

@cbrnr
Copy link
Contributor Author

cbrnr commented Mar 27, 2018

I think I already asked that, but is it possible to see the HTML docs built by a CI service? I know that other projects use CircleCI for this purpose (so that it is not necessary to set up everything locally).

@jorisvandenbossche
Copy link
Member

I think I already asked that, but is it possible to see the HTML docs built by a CI service? I know that other projects use CircleCI for this purpose (so that it is not necessary to set up everything locally).

No, it is currently not possible. Open issue about this: #17921


pd.options.display.max_columns = 20

.. _whatsnew_0230.api:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one sneaked in due to merge conflict? (anyhow it can be removed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, what do you mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the diff, you added this line. But this should not be added (therefore I assumed you added it by accident while updating against master with rebasing/merging). But so you can just remove this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean line 690 (.. _whatsnew_0230.api:)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's the line on which I commented. There is no header following for which that link would make sense (there is actually already another link label on line 692)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought there should be a section header before introducing the subsections. At least that's how it is done with .. _whatsnew_0230.api_breaking: in line 350 (but there is a heading after that). In any case, I'm happy to delete it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct. But I don't understand the relation with this line? This link is just floating with no section or subsection header following it. You already have a header with link at line 661 - 664 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for the section below:

.. _whatsnew_0230.api:

.. _whatsnew_0230.api.datetimelike:

Datetimelike API Changes
^^^^^^^^^^^^^^^^^^^^^^^^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that header already has the "whatsnew_0230.api.datetimelike" label, it does not need two labels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from my last two comments, looks good!

@pep8speaks
Copy link

Hello @cbrnr! Thanks for updating the PR.

Line 648:80: E501 line too long (81 > 79 characters)

@@ -625,7 +625,7 @@ def to_string(self):
max_len += size_tr_col # Need to make space for largest row
# plus truncate dot col
dif = max_len - self.w
adj_dif = dif
adj_dif = dif + 1 # see GH PR #17023
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put it on the line above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean

dif = max_len - self.w  # see GH PR #17023
adj_dif = dif

?

dif is never used so we might as well skip it completely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just meant to put the comment on its own line, not on the same line after the code, like

# '+ 1' to avoid too wide repr (GH PR #17023)
adj_dif = dif + 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, of course, I'll change that.

@jorisvandenbossche jorisvandenbossche merged commit c9e8f59 into pandas-dev:master Mar 28, 2018
@jorisvandenbossche
Copy link
Member

@cbrnr Thanks a lot for this (and for your patience getting this merged :))

@cbrnr cbrnr deleted the nicer_display_defaults branch March 28, 2018 08:30
javadnoorb pushed a commit to javadnoorb/pandas that referenced this pull request Mar 29, 2018
Change `max_columns` to `0` (automatically adapt the number of displayed columns to the actual terminal width)
dworvos pushed a commit to dworvos/pandas that referenced this pull request Apr 2, 2018
Change `max_columns` to `0` (automatically adapt the number of displayed columns to the actual terminal width)
kornilova203 pushed a commit to kornilova203/pandas that referenced this pull request Apr 23, 2018
Change `max_columns` to `0` (automatically adapt the number of displayed columns to the actual terminal width)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't pd.options.display.max_columns = 0 by default?
7 participants