Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: normalize NDArray to tensors, add a special-case for out= NDArrays #108

Merged
merged 1 commit into from
Apr 6, 2023

Conversation

ev-br
Copy link
Collaborator

@ev-br ev-br commented Apr 6, 2023

Arguments annotated as NDArray get normalized to their Tensors for further processing by implementer functions.
out= arguments however are special (the original array need to be preserved), and are never seen by implementers, so they keep being ndarrays.

So the annotations and their corresponding normalizations are

  • ArrayLike : a generic numpy array_like --- convert to Tensor, pass the Tensor to implementers
  • NDArray : ndarray exactly --- check the type to be ndarray, extract its Tensor, pass the Tensor to implenters
  • OutArray : ndarray exactly --- check the type to be ndarray, return ndarray, handle it on the normalizer level

Arguments annotated as NDArray get normalized to their Tensors for further
processing by implementer functions.
out= arguments however are special (the original array need to be preserved),
and are never seen by implementers, so they keep being ndarrays.
@ev-br ev-br requested a review from lezcano April 6, 2023 08:27
Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change NDArray to OutArray?

Note that the only use of out is through out.tensor

out.tensor.copy_(result)
return out

My best guess is that the Optional[NDArray] annotation we used before was not correct when we had tuples or lists as a return.

For reasons like this, I proposed to move out the out= implementation from the function into the decorator itself, as this input is different to all the others. Now, to avoid the extra packing-unpacking code, we need to make an exception an leak ndarrays into _funcs_impl.py, but well...

torch_np/_normalizations.py Show resolved Hide resolved
@ev-br
Copy link
Collaborator Author

ev-br commented Apr 6, 2023

Before this PR:

  • out: Optional[NDArray] annotation is special. NDArray annotation only checks the type of the argument being isinstance(arg, ndarray) and returns the array. This worked just fine because out arrays never made it to implementers, and maybe_copy_out worked in the normalizer level, exactly via out.tensor.copy_ All was well.

  • a single exception, copyto(dst: NDarray, src: ArrayLike) leaked an ndarray to implementers.

Now, if we make NDArray annotation return a tensor, we clean this single exception, but get a serious problem across the board with the out= semantics. Once we normalizer the out array into a Tensor we lose information about the original array. Then mabe_copy_out copies data into a tensor --- and wraps this tensor into a new array. So assert result is out is broken across the board

The OutArray annotation is to fix this. In the end of the day, out array is different from just a generic ndarray argument. NumPy almost never requires strict ndarrays --- here this is literally the only exception.

For reasons like this, I proposed to move out the out= implementation from the function into the decorator itself, as this input is different to all the others.

We discussed this a couple of times actually. This breaks using out as a positional argument, allowed by numpy.

So my inclination would be to just accept copyto being special, add a relevant comment to its source and be done with it. But I see that my practicality/purity meter is consistently off.

@ev-br
Copy link
Collaborator Author

ev-br commented Apr 6, 2023

My best guess is that the Optional[NDArray] annotation we used before was not correct when we had tuples or lists as a return

Yes. This is separate though. I'm fixing this using divmod as a guinea pig now. The current stumbling block is how to check that tuple[Optional[NDArray], Optional[NDArray]] is a typing.Sequence. Googling around suggests checking the argument itself not the annotation, but we cannot do this (e.g. a list is an ArrayLike but also a Sequence).

@lezcano
Copy link
Collaborator

lezcano commented Apr 6, 2023

We discussed this a couple of times actually. This breaks using out as a positional argument, allowed by numpy.

As mentioned in those discussions, it can be done. Here's a simplified example

import inspect

def add_out(f):
    def wrapper(*args, **kwargs):
        if "out" in kwargs:
            out = kwargs.pop("out")
        elif len(args) > len(inspect.getfullargspec(f).args):
            args, out = args[:-1], args[-1]
        print(out)
        result = f(*args, **kwargs)
        return result
    return wrapper


@add_out
def f(x, *, y=3):
    pass

f(2,7)
f(2, out=8)

This could go under a @normalize(has_out=True) flag.

@ev-br
Copy link
Collaborator Author

ev-br commented Apr 6, 2023

This is still brittle because there is no guarantee that out is the last positional arg. It really can be anywhere in the argument list.

@lezcano
Copy link
Collaborator

lezcano commented Apr 6, 2023

Fine. Let's just have it like this then.

As a separate point, the recursive step in maybe_copy_to calls a non-existent funciton called "copy_to`.

As for the typing issue, why don't you check that it's exactly a tuple, rather than trying to generalize to a Sequence?

@lezcano
Copy link
Collaborator

lezcano commented Apr 6, 2023

As for how to do that last bit, there's no nice way of doing it in the typing module for all I know.
This is how I did it in PrimTorch:
https://github.com/pytorch/pytorch/blob/b8cf01013975dfabe40966b8dc4555a2e069336c/torch/_decomp/__init__.py#L81-L86

@ev-br
Copy link
Collaborator Author

ev-br commented Apr 6, 2023

Fine. Let's just have it like this then.

This PR, with a special OutArray or an exception in copyto?

@lezcano
Copy link
Collaborator

lezcano commented Apr 6, 2023

This PR as is looks alright. Let's write a comment for what's the rationale behind this though (out no being kwarg only and being on arbitrary positions)

@ev-br
Copy link
Collaborator Author

ev-br commented Apr 6, 2023

Re copy_to and sequences --- yes, I've noticed and am fixing it. As a side note, we need to set up coverage to smoke out these unusual code paths.

PrimTorch's way is slick indeed. I was hoping there is a way to avoid looking up __origin__ but hey. My current issue is to cook up an incantation to cover both Sequence[ArrayLike] and tuple[Optional[NDarray], Optional[NDArray]] (the latter being the out= annotation for nout=2 ufuncs like divmod).

@lezcano
Copy link
Collaborator

lezcano commented Apr 6, 2023

Do the primtorch trick plus:

>>> import typing
>>> issubclass(tuple, typing.Sequence)
True
>>> issubclass(list, typing.Sequence)
True

Base automatically changed from dtypes_in_ufuncs to main April 6, 2023 10:54
@ev-br ev-br merged commit 3887e26 into main Apr 6, 2023
@ev-br ev-br deleted the ndarray_annot branch April 6, 2023 11:08
from ._ndarray import ndarray

if not isinstance(arg, ndarray):
raise TypeError("'out' must be an array")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should improve the error message using the name. Just patch this into the next PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ev-br
Copy link
Collaborator Author

ev-br commented Apr 6, 2023

OK, gh-109 does a minimal fix for out= being tuple only, and avoids doing one more refactor. Let's keep the primtorch trick for if/when we hit limitations of the current infra.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants