Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: avoid calling .values to know the result dtype in eval() #44791

Closed

Conversation

jorisvandenbossche
Copy link
Member

Currently, the pd.eval(..) expression parser calls .values several times just to know the dtype of that array. When you actually need to construct this array (which can be costly), we can know this dtype without actually constructing it.

TODO: I need to update the as_array_dtype method to ensure it always returns a np.dtype (now it can also return an extension dtype)

@jorisvandenbossche jorisvandenbossche added the Performance Memory or execution speed performance label Dec 6, 2021
@jorisvandenbossche jorisvandenbossche marked this pull request as draft December 6, 2021 20:01
@jorisvandenbossche jorisvandenbossche added the expressions pd.eval, query label Dec 6, 2021
(i.e. calling ``mgr.as_array()`` or ``df.values``).
"""
if len(self.arrays) == 0:
return np.dtype(float)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be explicitly np.float64? i never know how this will behave on windows or 32bit builds

@@ -1549,6 +1549,22 @@ def as_array(

return arr.transpose()

def as_array_dtype(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be shared in the base class?

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2022

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jan 6, 2022
@jreback
Copy link
Contributor

jreback commented Jan 16, 2022

@jorisvandenbossche can you merge master and update

@jreback jreback removed the Stale label Jan 16, 2022
@jbrockmendel
Copy link
Member

@jorisvandenbossche can you address comments and rebase

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2022

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Mar 4, 2022
@jreback
Copy link
Contributor

jreback commented Mar 4, 2022

prob a reasonable change but closing as stale

@jreback jreback closed this Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expressions pd.eval, query Performance Memory or execution speed performance Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants