Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: unify logic for form_blocks and make_blocks #19189

Merged
merged 8 commits into from
Jan 18, 2018

Conversation

jbrockmendel
Copy link
Member

In the background part of the intention here is to make things like #19174 easier.

@codecov
Copy link

codecov bot commented Jan 11, 2018

Codecov Report

Merging #19189 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19189      +/-   ##
==========================================
+ Coverage   91.53%   91.56%   +0.03%     
==========================================
  Files         147      148       +1     
  Lines       48797    48870      +73     
==========================================
+ Hits        44664    44749      +85     
+ Misses       4133     4121      -12
Flag Coverage Δ
#multiple 89.94% <100%> (+0.03%) ⬆️
#single 41.66% <97.82%> (+0.05%) ⬆️
Impacted Files Coverage Δ
pandas/core/internals.py 94.53% <100%> (+0.03%) ⬆️
pandas/core/dtypes/cast.py 87.98% <0%> (-0.68%) ⬇️
pandas/core/strings.py 98.17% <0%> (-0.3%) ⬇️
pandas/core/indexes/timedeltas.py 90.6% <0%> (-0.04%) ⬇️
pandas/errors/__init__.py 100% <0%> (ø) ⬆️
pandas/core/reshape/reshape.py 100% <0%> (ø) ⬆️
pandas/core/series.py 94.61% <0%> (ø) ⬆️
pandas/io/json/json.py 92.6% <0%> (ø) ⬆️
pandas/core/frame.py 97.62% <0%> (ø) ⬆️
pandas/io/common.py 69.06% <0%> (ø) ⬆️
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8347ff8...9d5bd26. Read the comment docs.

@@ -2914,37 +2914,54 @@ def sparse_reindex(self, new_index):
placement=self.mgr_locs)


_block_type_map = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than constructing this manually and adding code, we already have a mapping by using the block class name, or could add shortnames to each class. In any event, this can be autmatically constructed by interation over the blocks (you have to do this at the end of the file).

Alternatively, can have a registry (a dict), that each class registers itself), but that is slightly more complicated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, can have a registry (a dict), that each class registers itself), but that is slightly more complicated.

That's what I have in mind longer-term, which is why I went with a module-level dict for this. But you're right that is ways away.

rather than constructing this manually and adding code, we already have a mapping by using the block class name, or could add shortnames to each class. In any event, this can be autmatically constructed by interation over the blocks (you have to do this at the end of the file).

Are you thinking something like:

globs = globals()
_block_type_map = {x: globs[x] for x in globs if inspect.issclass(globs[x]) and issubclass(globs[x], Block)}

If so, this is one of the class of things that I'll begrudgingly implement if you insist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not at all, simply define

register_block as a function which sets the registry = {} to Block.name -> Block
then this is pretty easy

'datetime_tz': DatetimeTZBlock}


def _get_block_type(values, dtype=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no reason to privatize this, pls add a doc-string

datetime_items = []
datetime_tz_items = []
cat_items = []
items_dict = {'float': [],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a default dict

object_items.append((i, k, v))
block_type = _get_block_type(v)

if block_type == 'datetime' and v.dtype != _NS_DTYPE:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then don't do it

Copy link
Member Author

@jbrockmendel jbrockmendel Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The status quo does this (line 4697), so it merits double-checking. If we remove this line, then in the relevant case we will end up calling (inside simple_blockify) v = v.astype(_NS_DTYPE) which should be equivalent to this, but presumably less performant. I'm OK with that, but whoever put this here might have had a reason.

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Internals Related to non-user accessible pandas implementation Clean labels Jan 12, 2018
@jorisvandenbossche jorisvandenbossche changed the title unify logic for form_blocks and make_blocks CLN: unify logic for form_blocks and make_blocks Jan 12, 2018
@jbrockmendel
Copy link
Member Author

Decided to move _block_type_map inside make_block and punt on the registry idea for now.

if is_sparse(values):
block_type = 'sparse'
elif issubclass(vtype, np.floating):
block_type = 'float'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems even more complicated. just return the block type klass directly here, no need to return a string which we have a dict for.

@jreback jreback added this to the 0.23.0 milestone Jan 18, 2018
@jreback jreback merged commit 4086e42 into pandas-dev:master Jan 18, 2018
@jreback
Copy link
Contributor

jreback commented Jan 18, 2018

thanks

@jbrockmendel jbrockmendel deleted the internalize2 branch February 11, 2018 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Dtype Conversions Unexpected or buggy dtype conversions Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

form_blocks vs make_block inconsistency
2 participants