-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: unify logic for form_blocks and make_blocks #19189
Conversation
Codecov Report
@@ Coverage Diff @@
## master #19189 +/- ##
==========================================
+ Coverage 91.53% 91.56% +0.03%
==========================================
Files 147 148 +1
Lines 48797 48870 +73
==========================================
+ Hits 44664 44749 +85
+ Misses 4133 4121 -12
Continue to review full report at Codecov.
|
pandas/core/internals.py
Outdated
@@ -2914,37 +2914,54 @@ def sparse_reindex(self, new_index): | |||
placement=self.mgr_locs) | |||
|
|||
|
|||
_block_type_map = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than constructing this manually and adding code, we already have a mapping by using the block class name, or could add shortnames to each class. In any event, this can be autmatically constructed by interation over the blocks (you have to do this at the end of the file).
Alternatively, can have a registry (a dict), that each class registers itself), but that is slightly more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, can have a registry (a dict), that each class registers itself), but that is slightly more complicated.
That's what I have in mind longer-term, which is why I went with a module-level dict for this. But you're right that is ways away.
rather than constructing this manually and adding code, we already have a mapping by using the block class name, or could add shortnames to each class. In any event, this can be autmatically constructed by interation over the blocks (you have to do this at the end of the file).
Are you thinking something like:
globs = globals()
_block_type_map = {x: globs[x] for x in globs if inspect.issclass(globs[x]) and issubclass(globs[x], Block)}
If so, this is one of the class of things that I'll begrudgingly implement if you insist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not at all, simply define
register_block
as a function which sets the registry = {}
to Block.name -> Block
then this is pretty easy
pandas/core/internals.py
Outdated
'datetime_tz': DatetimeTZBlock} | ||
|
||
|
||
def _get_block_type(values, dtype=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no reason to privatize this, pls add a doc-string
pandas/core/internals.py
Outdated
datetime_items = [] | ||
datetime_tz_items = [] | ||
cat_items = [] | ||
items_dict = {'float': [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a default dict
pandas/core/internals.py
Outdated
object_items.append((i, k, v)) | ||
block_type = _get_block_type(v) | ||
|
||
if block_type == 'datetime' and v.dtype != _NS_DTYPE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then don't do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The status quo does this (line 4697), so it merits double-checking. If we remove this line, then in the relevant case we will end up calling (inside simple_blockify) v = v.astype(_NS_DTYPE)
which should be equivalent to this, but presumably less performant. I'm OK with that, but whoever put this here might have had a reason.
Decided to move _block_type_map inside make_block and punt on the registry idea for now. |
pandas/core/internals.py
Outdated
if is_sparse(values): | ||
block_type = 'sparse' | ||
elif issubclass(vtype, np.floating): | ||
block_type = 'float' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems even more complicated. just return the block type klass directly here, no need to return a string which we have a dict for.
thanks |
In the background part of the intention here is to make things like #19174 easier.
git diff upstream/master -u -- "*.py" | flake8 --diff