Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/main' into zarr-v3
Browse files Browse the repository at this point in the history
  • Loading branch information
grlee77 committed Apr 11, 2022
2 parents 64bddef + c12cb3d commit a6f4f3c
Show file tree
Hide file tree
Showing 16 changed files with 129 additions and 21 deletions.
8 changes: 5 additions & 3 deletions .github/stale.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Configuration for probot-stale - https://github.com/probot/stale

# Number of days of inactivity before an Issue or Pull Request becomes stale
daysUntilStale: 700 # start with a large number and reduce shortly
daysUntilStale: 600 # start with a large number and reduce shortly

# Number of days of inactivity before an Issue or Pull Request with the stale label is closed.
# Set to false to disable. If disabled, issues still need to be closed manually, but will remain marked as stale.
Expand Down Expand Up @@ -31,6 +31,9 @@ markComment: |
If this issue remains relevant, please comment here or remove the `stale` label; otherwise it will be marked as closed automatically
closeComment: |
The stalebot didn't hear anything for a while, so it closed this. Please reopen if this is still an issue.
# Comment to post when removing the stale label.
# unmarkComment: >
# Your comment here.
Expand All @@ -40,8 +43,7 @@ markComment: |
# Your comment here.

# Limit the number of actions per hour, from 1-30. Default is 30
limitPerRun: 1 # start with a small number

limitPerRun: 2 # start with a small number

# Limit to only `issues` or `pulls`
# only: issues
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/benchmarks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ jobs:
cp benchmarks/README_CI.md benchmarks.log .asv/results/
working-directory: ${{ env.ASV_DIR }}

- uses: actions/upload-artifact@v2
- uses: actions/upload-artifact@v3
if: always()
with:
name: asv-benchmark-results-${{ runner.os }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci-additional.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ jobs:
$PYTEST_EXTRA_FLAGS
- name: Upload code coverage to Codecov
uses: codecov/codecov-action@v2.1.0
uses: codecov/codecov-action@v3.0.0
with:
file: ./coverage.xml
flags: unittests,${{ matrix.env }}
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -98,13 +98,13 @@ jobs:

- name: Upload test results
if: always()
uses: actions/upload-artifact@v2
uses: actions/upload-artifact@v3
with:
name: Test results for ${{ runner.os }}-${{ matrix.python-version }}
path: pytest.xml

- name: Upload code coverage to Codecov
uses: codecov/codecov-action@v2.1.0
uses: codecov/codecov-action@v3.0.0
with:
file: ./coverage.xml
flags: unittests
Expand All @@ -118,7 +118,7 @@ jobs:
if: github.repository == 'pydata/xarray'
steps:
- name: Upload
uses: actions/upload-artifact@v2
uses: actions/upload-artifact@v3
with:
name: Event File
path: ${{ github.event_path }}
6 changes: 3 additions & 3 deletions .github/workflows/pypi-release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ jobs:
else
echo "✅ Looks good"
fi
- uses: actions/upload-artifact@v2
- uses: actions/upload-artifact@v3
with:
name: releases
path: dist
Expand All @@ -54,7 +54,7 @@ jobs:
name: Install Python
with:
python-version: 3.8
- uses: actions/download-artifact@v2
- uses: actions/download-artifact@v3
with:
name: releases
path: dist
Expand Down Expand Up @@ -85,7 +85,7 @@ jobs:
if: github.event_name == 'release'
runs-on: ubuntu-latest
steps:
- uses: actions/download-artifact@v2
- uses: actions/download-artifact@v3
with:
name: releases
path: dist
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/upstream-dev-ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ jobs:
&& steps.status.outcome == 'failure'
&& github.event_name == 'schedule'
&& github.repository == 'pydata/xarray'
uses: actions/upload-artifact@v2
uses: actions/upload-artifact@v3
with:
name: output-${{ matrix.python-version }}-log
path: output-${{ matrix.python-version }}-log
Expand All @@ -114,7 +114,7 @@ jobs:
- uses: actions/setup-python@v3
with:
python-version: "3.x"
- uses: actions/download-artifact@v2
- uses: actions/download-artifact@v3
with:
path: /tmp/workspace/logs
- name: Move all log files into a single directory
Expand Down
3 changes: 3 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ New Features
elements which trigger summarization rather than full repr in (numpy) array
detailed views of the html repr (:pull:`6400`).
By `Benoît Bovy <https://github.com/benbovy>`_.
- Allow passing chunks in **kwargs form to :py:meth:`Dataset.chunk`, :py:meth:`DataArray.chunk`, and
:py:meth:`Variable.chunk`. (:pull:`6471`)
By `Tom Nicholas <https://github.com/TomNicholas>`_.
Breaking changes
~~~~~~~~~~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion xarray/conventions.py
Original file line number Diff line number Diff line change
Expand Up @@ -770,7 +770,7 @@ def _encode_coordinates(variables, attributes, non_dim_coord_names):
# this will copy coordinates from encoding to attrs if "coordinates" in attrs
# after the next line, "coordinates" is never in encoding
# we get support for attrs["coordinates"] for free.
coords_str = pop_to(encoding, attrs, "coordinates")
coords_str = pop_to(encoding, attrs, "coordinates") or attrs.get("coordinates")
if not coords_str and variable_coordinates[name]:
coordinates_text = " ".join(
str(coord_name)
Expand Down
18 changes: 17 additions & 1 deletion xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -1113,6 +1113,7 @@ def chunk(
name_prefix: str = "xarray-",
token: str = None,
lock: bool = False,
**chunks_kwargs: Any,
) -> DataArray:
"""Coerce this array's data into a dask arrays with the given chunks.
Expand All @@ -1136,13 +1137,28 @@ def chunk(
lock : optional
Passed on to :py:func:`dask.array.from_array`, if the array is not
already as dask array.
**chunks_kwargs : {dim: chunks, ...}, optional
The keyword arguments form of ``chunks``.
One of chunks or chunks_kwargs must be provided.
Returns
-------
chunked : xarray.DataArray
"""
if isinstance(chunks, (tuple, list)):
if chunks is None:
warnings.warn(
"None value for 'chunks' is deprecated. "
"It will raise an error in the future. Use instead '{}'",
category=FutureWarning,
)
chunks = {}

if isinstance(chunks, (float, str, int)):
chunks = dict.fromkeys(self.dims, chunks)
elif isinstance(chunks, (tuple, list)):
chunks = dict(zip(self.dims, chunks))
else:
chunks = either_dict_or_kwargs(chunks, chunks_kwargs, "chunk")

ds = self._to_temp_dataset().chunk(
chunks, name_prefix=name_prefix, token=token, lock=lock
Expand Down
10 changes: 8 additions & 2 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -2000,6 +2000,7 @@ def chunk(
name_prefix: str = "xarray-",
token: str = None,
lock: bool = False,
**chunks_kwargs: Any,
) -> Dataset:
"""Coerce all arrays in this dataset into dask arrays with the given
chunks.
Expand All @@ -2013,7 +2014,7 @@ def chunk(
Parameters
----------
chunks : int, "auto" or mapping of hashable to int, optional
chunks : int, tuple of int, "auto" or mapping of hashable to int, optional
Chunk sizes along each dimension, e.g., ``5``, ``"auto"``, or
``{"x": 5, "y": 5}``.
name_prefix : str, optional
Expand All @@ -2023,6 +2024,9 @@ def chunk(
lock : optional
Passed on to :py:func:`dask.array.from_array`, if the array is not
already as dask array.
**chunks_kwargs : {dim: chunks, ...}, optional
The keyword arguments form of ``chunks``.
One of chunks or chunks_kwargs must be provided
Returns
-------
Expand All @@ -2034,7 +2038,7 @@ def chunk(
Dataset.chunksizes
xarray.unify_chunks
"""
if chunks is None:
if chunks is None and chunks_kwargs is None:
warnings.warn(
"None value for 'chunks' is deprecated. "
"It will raise an error in the future. Use instead '{}'",
Expand All @@ -2044,6 +2048,8 @@ def chunk(

if isinstance(chunks, (Number, str, int)):
chunks = dict.fromkeys(self.dims, chunks)
else:
chunks = either_dict_or_kwargs(chunks, chunks_kwargs, "chunk")

bad_dims = chunks.keys() - self.dims.keys()
if bad_dims:
Expand Down
2 changes: 1 addition & 1 deletion xarray/core/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -266,7 +266,7 @@ def either_dict_or_kwargs(
kw_kwargs: Mapping[str, T],
func_name: str,
) -> Mapping[Hashable, T]:
if pos_kwargs is None:
if pos_kwargs is None or pos_kwargs == {}:
# Need an explicit cast to appease mypy due to invariance; see
# https://github.com/python/mypy/issues/6228
return cast(Mapping[Hashable, T], kw_kwargs)
Expand Down
24 changes: 22 additions & 2 deletions xarray/core/variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import numbers
import warnings
from datetime import timedelta
from typing import TYPE_CHECKING, Any, Hashable, Mapping, Sequence
from typing import TYPE_CHECKING, Any, Hashable, Literal, Mapping, Sequence

import numpy as np
import pandas as pd
Expand Down Expand Up @@ -1012,7 +1012,19 @@ def chunksizes(self) -> Mapping[Any, tuple[int, ...]]:

_array_counter = itertools.count()

def chunk(self, chunks={}, name=None, lock=False):
def chunk(
self,
chunks: (
int
| Literal["auto"]
| tuple[int, ...]
| tuple[tuple[int, ...], ...]
| Mapping[Any, None | int | tuple[int, ...]]
) = {},
name: str = None,
lock: bool = False,
**chunks_kwargs: Any,
) -> Variable:
"""Coerce this array's data into a dask array with the given chunks.
If this variable is a non-dask array, it will be converted to dask
Expand All @@ -1034,6 +1046,9 @@ def chunk(self, chunks={}, name=None, lock=False):
lock : optional
Passed on to :py:func:`dask.array.from_array`, if the array is not
already as dask array.
**chunks_kwargs : {dim: chunks, ...}, optional
The keyword arguments form of ``chunks``.
One of chunks or chunks_kwargs must be provided.
Returns
-------
Expand All @@ -1049,6 +1064,11 @@ def chunk(self, chunks={}, name=None, lock=False):
)
chunks = {}

if isinstance(chunks, (float, str, int, tuple, list)):
pass # dask.array.from_array can handle these directly
else:
chunks = either_dict_or_kwargs(chunks, chunks_kwargs, "chunk")

if utils.is_dict_like(chunks):
chunks = {self.get_axis_num(dim): chunk for dim, chunk in chunks.items()}

Expand Down
19 changes: 19 additions & 0 deletions xarray/tests/test_conventions.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,25 @@ def test_multidimensional_coordinates(self) -> None:
# Should not have any global coordinates.
assert "coordinates" not in attrs

def test_var_with_coord_attr(self) -> None:
# regression test for GH6310
# don't overwrite user-defined "coordinates" attributes
orig = Dataset(
{"values": ("time", np.zeros(2), {"coordinates": "time lon lat"})},
coords={
"time": ("time", np.zeros(2)),
"lat": ("time", np.zeros(2)),
"lon": ("time", np.zeros(2)),
},
)
# Encode the coordinates, as they would be in a netCDF output file.
enc, attrs = conventions.encode_dataset_coordinates(orig)
# Make sure we have the right coordinates for each variable.
values_coords = enc["values"].attrs.get("coordinates", "")
assert set(values_coords.split()) == {"time", "lat", "lon"}
# Should not have any global coordinates.
assert "coordinates" not in attrs

def test_do_not_overwrite_user_coordinates(self) -> None:
orig = Dataset(
coords={"x": [0, 1, 2], "y": ("x", [5, 6, 7]), "z": ("x", [8, 9, 10])},
Expand Down
5 changes: 5 additions & 0 deletions xarray/tests/test_dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -804,6 +804,11 @@ def test_chunk(self):
assert isinstance(blocked.data, da.Array)
assert "testname_" in blocked.data.name

# test kwargs form of chunks
blocked = unblocked.chunk(dim_0=3, dim_1=3)
assert blocked.chunks == ((3,), (3, 1))
assert blocked.data.name != first_dask_name

def test_isel(self):
assert_identical(self.dv[0], self.dv.isel(x=0))
assert_identical(self.dv, self.dv.isel(x=slice(None)))
Expand Down
5 changes: 4 additions & 1 deletion xarray/tests/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -921,6 +921,9 @@ def test_chunk(self):
expected_chunks = {"dim1": (8,), "dim2": (9,), "dim3": (10,)}
assert reblocked.chunks == expected_chunks

# test kwargs form of chunks
assert data.chunk(**expected_chunks).chunks == expected_chunks

def get_dask_names(ds):
return {k: v.data.name for k, v in ds.items()}

Expand All @@ -947,7 +950,7 @@ def get_dask_names(ds):
new_dask_names = get_dask_names(reblocked)
assert reblocked.chunks == expected_chunks
assert_identical(reblocked, data)
# recuhnking with same chunk sizes should not change names
# rechunking with same chunk sizes should not change names
for k, v in new_dask_names.items():
assert v == orig_dask_names[k]

Expand Down
34 changes: 34 additions & 0 deletions xarray/tests/test_variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -2154,6 +2154,40 @@ def test_coarsen_keep_attrs(self, operation="mean"):
class TestVariableWithDask(VariableSubclassobjects):
cls = staticmethod(lambda *args: Variable(*args).chunk())

def test_chunk(self):
unblocked = Variable(["dim_0", "dim_1"], np.ones((3, 4)))
assert unblocked.chunks is None

blocked = unblocked.chunk()
assert blocked.chunks == ((3,), (4,))
first_dask_name = blocked.data.name

blocked = unblocked.chunk(chunks=((2, 1), (2, 2)))
assert blocked.chunks == ((2, 1), (2, 2))
assert blocked.data.name != first_dask_name

blocked = unblocked.chunk(chunks=(3, 3))
assert blocked.chunks == ((3,), (3, 1))
assert blocked.data.name != first_dask_name

# name doesn't change when rechunking by same amount
# this fails if ReprObject doesn't have __dask_tokenize__ defined
assert unblocked.chunk(2).data.name == unblocked.chunk(2).data.name

assert blocked.load().chunks is None

# Check that kwargs are passed
import dask.array as da

blocked = unblocked.chunk(name="testname_")
assert isinstance(blocked.data, da.Array)
assert "testname_" in blocked.data.name

# test kwargs form of chunks
blocked = unblocked.chunk(dim_0=3, dim_1=3)
assert blocked.chunks == ((3,), (3, 1))
assert blocked.data.name != first_dask_name

@pytest.mark.xfail
def test_0d_object_array_with_list(self):
super().test_0d_object_array_with_list()
Expand Down

0 comments on commit a6f4f3c

Please sign in to comment.