Skip to content

Commit

Permalink
Rename to flox
Browse files Browse the repository at this point in the history
  • Loading branch information
dcherian committed Nov 16, 2021
1 parent 27a46f7 commit 248d4b3
Show file tree
Hide file tree
Showing 11 changed files with 27 additions and 21 deletions.
34 changes: 20 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,37 @@
[![GitHub Workflow CI Status](https://img.shields.io/github/workflow/status/dcherian/dask_groupby/CI?logo=github&style=for-the-badge)](https://github.com/dcherian/dask_groupby/actions)[![GitHub Workflow Code Style Status](https://img.shields.io/github/workflow/status/dcherian/dask_groupby/code-style?label=Code%20Style&style=for-the-badge)](https://github.com/dcherian/dask_groupby/actions)[![image](https://img.shields.io/codecov/c/github/dcherian/dask_groupby.svg?style=for-the-badge)](https://codecov.io/gh/dcherian/dask_groupby)
[![GitHub Workflow CI Status](https://img.shields.io/github/workflow/status/dcherian/flox/CI?logo=github&style=for-the-badge)](https://github.com/dcherian/flox/actions)[![GitHub Workflow Code Style Status](https://img.shields.io/github/workflow/status/dcherian/flox/code-style?label=Code%20Style&style=for-the-badge)](https://github.com/dcherian/flox/actions)[![image](https://img.shields.io/codecov/c/github/dcherian/flox.svg?style=for-the-badge)](https://codecov.io/gh/dcherian/flox)

# dask_groupby
# flox

This project explores strategies for fast GroupBy reductions with dask.array. It used to be called `dask_groupby`

This repo explores strategies for a distributed GroupBy with dask
arrays. It was motivated by

1. Dask Dataframe GroupBy
[blogpost](https://blog.dask.org/2019/10/08/df-groupby)
2. numpy_groupies in Xarray
[issue](https://github.com/pydata/xarray/issues/4473)

(See a
[presentation](https://docs.google.com/presentation/d/1muj5Yzjw-zY8c6agjyNBd2JspfANadGSDvdd6nae4jg/edit?usp=sharing)
about this package).

## Acknowledgements

This work was funded in part by NASA-ACCESS 80NSSC18M0156 "Community tools for analysis of NASA Earth Observing System
Data in the Cloud" (PI J. Hamman), and [NCAR's Earth System Data Science Initiative](https://ncar.github.io/esds/).
It was motivated by many discussions in the [Pangeo](https://pangeo.io) community.

## API

There are three functions
1. `groupby_reduce(dask_array, by_dask_array, "mean")`
1. `flox.groupby_reduce(dask_array, by_dask_array, "mean")`
"pure" dask array interface
2. `xarray_groupby_reduce(groupby_object, "mean")`
xarray groupby interface that accepts a GroupBy object for convenience
3. `xarray_reduce(xarray_object, by_dataarray, "mean")`
1. `flox.xarray.xarray_reduce(xarray_object, by_dataarray, "mean")`
"pure" xarray interface

## Implementation

This repo explores strategies for a distributed GroupBy with dask
arrays. It was motivated by

1. Dask Dataframe GroupBy
[blogpost](https://blog.dask.org/2019/10/08/df-groupby)
2. numpy_groupies in Xarray
[issue](https://github.com/pydata/xarray/issues/4473)

The core GroupBy operation is outsourced to
[numpy_groupies](https://github.com/ml31415/numpy-groupies). The GroupBy
reduction is first applied blockwise. Those intermediate results are
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ profile = "black"
skip_gitignore = true
float_to_top = true
combine_as_imports = true
known_first_party = "dask_groupby"
known_first_party = "flox"
known_third_party = [
"dask",
"numpy",
Expand All @@ -34,7 +34,7 @@ known_third_party = [
[tool.mypy]
allow_redefinition = true
exclude = "properties|asv_bench|doc|tests|flycheck"
files = "dask_groupby/*.py"
files = "flox/*.py"
show_error_codes = true

[[tool.mypy.overrides]]
Expand Down
6 changes: 3 additions & 3 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
[metadata]
name = dask_groupby
author = dask_groupby Developers
name = flox
author = flox Developers
author_email = deepak@cherian.net
license = Apache
description = GroupBy operations for dask.array
long_description = file: README.md
long_description_content_type=text/markdown

url = https://github.com/dcherian/dask_groupby
url = https://github.com/dcherian/flox
classifiers =
Development Status :: 4 - Beta
License :: OSI Approved :: Apache Software License
Expand Down
2 changes: 1 addition & 1 deletion tests/test_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from dask.array import from_array
from numpy_groupies.aggregate_numpy import aggregate

from dask_groupby.core import (
from flox.core import (
_get_optimal_chunks_for_groups,
find_group_cohorts,
groupby_reduce,
Expand Down
2 changes: 1 addition & 1 deletion tests/test_xarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import pytest
import xarray as xr

from dask_groupby.xarray import (
from flox.xarray import (
rechunk_to_group_boundaries,
resample_reduce,
xarray_groupby_reduce,
Expand Down

0 comments on commit 248d4b3

Please sign in to comment.