Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where/mask methods for Series #2337

Closed
wants to merge 7 commits into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Nov 23, 2012

add where and mask methods for Series, analagous to DataFrame methods added for 0.9.1
passes all tests

where is equivalent to: s[cond].reindex_like(s).fillna(other)

In [7]: s = pd.Series(np.random.rand(5))

In [8]: s
Out[8]: 
0    0.638664
1    0.574688
2    0.460510
3    0.641840
4    0.044129

In [10]: s[0:2] = -s[0:2]

In [11]: s
Out[11]: 
0   -0.638664
1   -0.574688
2    0.460510
3    0.641840
4    0.044129

boolean selection

In [12]: s[s>0]
Out[12]: 
2    0.460510
3    0.641840
4    0.044129

In [13]: s.where(s>0)
Out[13]: 
0         NaN
1         NaN
2    0.460510
3    0.641840
4    0.044129

In [14]: s.where(s>0,-s)
Out[14]: 
0    0.638664
1    0.574688
2    0.460510
3    0.641840
4    0.044129

In [15]: s.mask(s<=0)
Out[15]: 
0         NaN
1         NaN
2    0.460510
3    0.641840
4    0.044129

support setting as well (though not used anywhere explicity)

In [16]: s2 = s.copy()

In [17]: s2.where(s2>0,inplace=True)
Out[17]: 
0         NaN
1         NaN
2    0.460510
3    0.641840
4    0.044129

In [18]: s2
Out[18]: 
0         NaN
1         NaN
2    0.460510
3    0.641840
4    0.044129

  1. added __str__ (to do __repr__)
  2. row removal in tables is much faster if rows are consecutive
  3. added Term class, refactored Selection (this is backdwards compatible)
     Term is a concise way of specifying conditions for queries, e.g.

        Term(dict(field = 'index', op = '>', value = '20121114'))
        Term('index', '20121114')
        Term('index', '>', '20121114')
        Term('index', ['20121114','20121114'])
        Term('index', datetime(2012,11,14))
        Term('index>20121114')

     updated tests for same

  this should close GH pandas-dev#1996
…e (see test_append)

this the result of incompatibility testing on the index_kind
  think about doing this automagically for tables
…of index columns minimum size

changed pytables version test for indexing around a bit
added Col class to manage the column conversions
added alias to the Term class; you can specify the nomial indexers (e.g. index in DataFrame, major_axis/minor_axis or alias in Panel)
updated docs for pytables to reflect these changes
updated docs for indexing to incorporate whatsnew 0.9.1 for where and mask
…d for the cond with a shape like the original
@jreback jreback closed this Nov 23, 2012
@jreback jreback reopened this Nov 23, 2012
changhiskhan added a commit that referenced this pull request Nov 24, 2012
@changhiskhan
Copy link
Contributor

I cherry-picked this.
I changed the implementation of where a little bit as _set_value would have failed for non-scalar other.
I also added a few more test cases.

Thanks for the PR!

@jreback
Copy link
Contributor Author

jreback commented Nov 24, 2012

added docs for these in commit: 2d57979

@changhiskhan
Copy link
Contributor

cherry-picked. Thank you!

@durden
Copy link
Contributor

durden commented Nov 28, 2012

Are the where and mask methods supposed to be included in 0.9.1? I'm not seeing them, but maybe I've got something setup incorrectly?

>>> pandas.__version__
'0.9.1'
>>> df['col2'].where
Traceback (most recent call last):
  File "<ipython-input-19-885524901cf2>", line 1, in <module>
    df['col2'].where
AttributeError: 'Series' object has no attribute 'where'
>>> dir(df['col2'])
['T', '_AXIS_ALIASES', '_AXIS_NAMES', '_AXIS_NUMBERS', '__abs__', '__add__', '__and__', '__array__', '__array_finalize__', '__array_interface__', '__array_prepare__', '__array_priority__', '__array_struct__', '__array_wrap__', '__class__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__delslice__', '__dict__', '__div__', '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getslice__', '__gt__', '__hash__', '__hex__', '__iadd__', '__iand__', '__idiv__', '__ifloordiv__', '__ilshift__', '__imod__', '__imul__', '__index__', '__init__', '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__long__', '__lshift__', '__lt__', '__mod__', '__module__', '__mul__', '__ne__', '__neg__', '__new__', '__nonzero__', '__oct__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__setitem__', '__setslice__', '__setstate__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__weakref__', '__xor__', '_agg_by_level', '_binop', '_can_hold_na', '_check_bool_indexer', '_constructor', '_get_axis', '_get_axis_name', '_get_axis_number', '_get_repr', '_get_val_at', '_get_values', '_get_values_tuple', '_get_with', '_index', '_ix', '_reindex_indexer', '_repr_footer', '_set_labels', '_set_values', '_set_with', '_tidy_repr', 'abs', 'add', 'align', 'all', 'any', 'append', 'apply', 'argmax', 'argmin', 'argsort', 'asfreq', 'asof', 'astype', 'at_time', 'autocorr', 'base', 'between', 'between_time', 'byteswap', 'choose', 'clip', 'clip_lower', 'clip_upper', 'combine', 'combine_first', 'compress', 'conj', 'conjugate', 'copy', 'corr', 'count', 'cov', 'ctypes', 'cummax', 'cummin', 'cumprod', 'cumsum', 'data', 'describe', 'diagonal', 'diff', 'div', 'dot', 'drop', 'dropna', 'dtype', 'dump', 'dumps', 'fill', 'fillna', 'first', 'first_valid_index', 'flags', 'flat', 'flatten', 'from_array', 'from_csv', 'get', 'get_value', 'getfield', 'groupby', 'head', 'hist', 'idxmax', 'idxmin', 'iget', 'iget_value', 'imag', 'index', 'interpolate', 'irow', 'isin', 'isnull', 'item', 'itemset', 'itemsize', 'iteritems', 'iterkv', 'ix', 'keys', 'kurt', 'last', 'last_valid_index', 'load', 'mad', 'map', 'max', 'mean', 'median', 'min', 'mul', 'name', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'notnull', 'nunique', 'order', 'pct_change', 'plot', 'prod', 'ptp', 'put', 'quantile', 'rank', 'ravel', 'real', 'reindex', 'reindex_like', 'rename', 'reorder_levels', 'repeat', 'replace', 'resample', 'reset_index', 'reshape', 'resize', 'round', 'save', 'searchsorted', 'select', 'set_value', 'setasflat', 'setfield', 'setflags', 'shape', 'shift', 'size', 'skew', 'sort', 'sort_index', 'sortlevel', 'squeeze', 'std', 'str', 'strides', 'sub', 'sum', 'swapaxes', 'swaplevel', 'tail', 'take', 'to_csv', 'to_dict', 'to_sparse', 'to_string', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'truncate', 'tshift', 'tz_convert', 'tz_localize', 'unique', 'unstack', 'update', 'valid', 'value_counts', 'values', 'var', 'view', 'weekday']

@jreback
Copy link
Contributor Author

jreback commented Nov 28, 2012

in 0.9.1 they only exist in a DataFrame
0.10.0 will support Series as well

docs will explain the slight difference

eg

s[cond] will return a subset of rows
but df[cond] returns a same sized frame
if s is a series and df is a frame

On Nov 28, 2012, at 3:22 PM, Luke Lee notifications@github.com wrote:

Are the where and mask methods supposed to be included in 0.9.1? I'm not seeing them, but maybe I've got something setup incorrectly?

pandas.version
'0.9.1'
df['col2'].where
Traceback (most recent call last):
File "", line 1, in
df['col2'].where
AttributeError: 'Series' object has no attribute 'where'
dir(df['col2'])
['T', '_AXIS_ALIASES', '_AXIS_NAMES', '_AXIS_NUMBERS', 'abs', 'add', 'and', 'array', 'array_finalize', 'array_interface', 'array_prepare', 'array_priority', 'array_struct', 'array_wrap', 'class', 'contains'
, 'copy', 'deepcopy', 'delattr', 'delitem', 'delslice', 'dict', 'div', 'divmod', 'doc', 'eq', 'float', 'floordiv', 'format', 'ge', 'getattribute', 'getitem', 'getslice', 'gt', 'hash', 'hex', 'iadd', 'iand', 'idiv', 'ifloordiv', 'ilshift', 'imod', 'imul', 'index', 'init', 'int', 'invert', 'ior', 'ipow', <
span class="s">'irshift', 'isub', 'iter', 'itruediv', 'ixor', 'le', 'len', 'long', 'lshift', 'lt', 'mod', 'module', 'mul', 'ne', 'neg', 'new', 'nonzero', <
span class="s">'oct', 'or', 'pos', 'pow', 'radd', 'rand', 'rdiv', 'rdivmod', 'reduce', 'reduce_ex', 'repr', 'rfloordiv', 'rlshift', 'rmod', 'rmul', 'ror', 'rpow',
'rrshift', 'rshift', 'rsub', 'rtruediv', 'rxor', 'setattr', 'setitem', 'setslice', 'setstate', 'sizeof', 'str', 'sub', 'subclasshook', 'truediv', 'weakref', 'xor', '
_agg_by_level', '_binop', '_can_hold_na', '_check_bool_indexer', '_constructor', '_get_axis', '_get_axis_name', '_get_axis_number', '_get_repr', '_get_val_at', '_get_values', '_get_values_tuple', '_get_with', '_index', '_ix', '_reindex_indexer', 'repr
footer', '_set_labels', '_set_values', '_set_with', '_tidy_repr', 'abs', 'add', 'align', 'all', 'any', 'append', 'apply', 'argmax', 'argmin', 'argsort', 'asfreq', 'asof', 'astype', 'at_time', 'autocorr', 'base', 'between', 'between_time', 'byteswap', 'choose', 'clip', 'clip_lower', 'clip_upper', 'combine', 'combine_first', 'compress', 'conj', 'conjugate', 'copy', 'corr', 'count', 'cov', 'ctypes', 'cummax', 'cummin', 'cumprod', 'cumsum', 'data', 'describe', 'diagonal', 'diff', 'div', 'dot', 'drop', 'dropna', 'dtype', 'dump', 'dumps', 'fill', 'fillna', 'first', 'first_valid_index', 'flags', 'flat', 'flatten', 'from_array', 'from_csv', 'get', 'get_value', 'getfield', 'groupby', 'head', 'hist', 'idxmax', 'idxmin', 'iget', 'iget_value', 'imag', 'index', 'interpolate', 'irow', 'isin', 'isnull', 'item', 'itemset', 'itemsize', 'iteritems', 'iterkv', 'ix', 'keys', 'kurt', 'last', 'last_valid_index', 'load', 'mad', 'map', 'max', 'mean', 'median', 'min', 'mul', 'name', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'notnull', 'nunique', 'order', 'pct_change', 'plot', 'prod', 'ptp', 'put', 'quantile', 'rank', 'ravel', 'real', 'reindex', 'reindex_like', 'rename', 'reorder_levels', 'repeat', 'replace', 'resample', 'reset_index', 'reshape', 'resize', 'round', 'save', 'searchsorted', 'select', 'set_value', 'setasflat', 'setfield', 'setflags', 'shape', 'shift', 'size', 'skew', 'sort', 'sort_index', 'sortlevel', 'squeeze', 'std', 'str', <
span class="s">'strides', 'sub', 'sum', 'swapaxes', 'swaplevel', 'tail', 'take', 'to_csv', 'to_dict', 'to_sparse', 'to_string', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'truncate', 'tshift', 'tz_convert', 'tz_localize', 'unique', 'unstack', 'update', 'valid', 'value_counts', 'values', 'var', 'view', 'weekday']


Reply to this email directly or view it on GitHub.

@durden
Copy link
Contributor

durden commented Nov 28, 2012

@jreback Ah, that clears up the confusion. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants