Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pan Deng: Integrating pandas.Panel and xarray Features #127

Merged
merged 3 commits into from
Mar 25, 2016

Conversation

OXPHOS
Copy link
Contributor

@OXPHOS OXPHOS commented Mar 23, 2016

  • I contacted NumFOCUS mentors before writing my proposal
  • I showed you my contribution (it can be any form: proof-of-concept project idea, some sample code or just a link to your commits from other project)
  • I linked to my sample contribution from the proposal
  • I linked to my opened issues in numfocus/gsoc repository from the proposal

@OXPHOS
Copy link
Contributor Author

OXPHOS commented Mar 23, 2016

Hey I finished my proposal draft about Panel/xarray integration project. Would you mind have a look at it when you have time? Thanks so much! @shoyer @jreback


## Technical Details

Most of my proposal is supposed to be carried out with current implemented features in pandas and xarray. For PCA part, to improve the performance, I might switch to C++ and Eigen3 library.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently xarray is pure python, this would be a very large change. using cython and/or numba might be acceptable. I suspect that xarray will eventually have to go down this route. @shoyer can shed some more light.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xarray is currently pure Python built on top of NumPy and dask.array. Indeed, Cython or Numba might be acceptable, but Numba at least would need to be an optional dependency.

For PCA in particular, it would make sense to wrap existing implementations/wrappers in SciPy rather than rolling your own.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it..No experience with cython or SciPy. I will check it out.

@shoyer
Copy link

shoyer commented Mar 23, 2016

One feature that would have a lot of impact would be bringing groupby performance in xarray up to par with pandas: pydata/xarray#659

@OXPHOS
Copy link
Contributor Author

OXPHOS commented Mar 23, 2016

Honestly I didn't find too much to transfer from pandas to xarray and I am trying to dig something out from nowhere. Or actually I should focus on add-on features like cummsum, prod and rank, but not basic ones like multiIndex too much?

Also I have the following todo list from pydata/xarray#702. Can I include all this in my proposal?
-[ ] Make levels accessible as coordinate variables (e.g., ds['time'] can pull out the 'time' level of a multi-index)
-[ ] Make isel_points/sel_points return objects with a MultiIndex? (probably after the previous TODO, so we can preserve basic backwards compatibility)
-[ ] Add set_index/reset_index/swaplevel to make it easier to create and manipulate multi-indexes

@rgaiacs rgaiacs merged commit b643470 into numfocus:master Mar 25, 2016
@rgaiacs rgaiacs added the Pandas label Mar 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants