Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow generators and iterators #194

Merged
merged 26 commits into from
Dec 18, 2020
Merged
Changes from 1 commit
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
10c5c2b
Allow generators and iterators in evaluate
dkarrasch Dec 5, 2020
d3dd6e4
fix test
dkarrasch Dec 5, 2020
27699ff
fix one type-thing
dkarrasch Dec 5, 2020
e127fb4
include result_type proposal, add hamming tests
dkarrasch Dec 5, 2020
948291d
include renyi_divergence, haversine, bregman
dkarrasch Dec 6, 2020
243b7b0
include bhattacharyya / hellinger
dkarrasch Dec 6, 2020
8101bb3
Update test/test_dists.jl
dkarrasch Dec 8, 2020
8f44a30
include some review comments
dkarrasch Dec 8, 2020
c055d4d
relax parameter types
dkarrasch Dec 8, 2020
0227942
clean up UnionMetric evaluate
dkarrasch Dec 8, 2020
9b34ed9
include iterator-based pair- and colwise
dkarrasch Dec 12, 2020
85cdb1b
simplify/optimize pairwise
dkarrasch Dec 12, 2020
46a91c2
include generic result_type tests
dkarrasch Dec 13, 2020
8fb5108
Revert "clean up UnionMetric evaluate"
dkarrasch Dec 13, 2020
5d04ff0
minor UnionMetric edits
dkarrasch Dec 14, 2020
27994eb
include comments from code review
dkarrasch Dec 14, 2020
6e4c09b
add colwise & pairwise docstrings
dkarrasch Dec 14, 2020
5b096d4
Apply suggestions from code review
dkarrasch Dec 15, 2020
518fd3d
simplify _eltype, add a note to colwise docstring
dkarrasch Dec 15, 2020
18f17af
fix typo
dkarrasch Dec 15, 2020
8640d36
transpose -> permutedims
dkarrasch Dec 15, 2020
5be0415
handle CartesianIndex
dkarrasch Dec 15, 2020
4333bda
increase code coverage
dkarrasch Dec 15, 2020
8219ad0
fix docstrings
dkarrasch Dec 16, 2020
5402ed8
Revert "handle CartesianIndex"
dkarrasch Dec 16, 2020
1350d47
rm redundant tests
dkarrasch Dec 17, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 71 additions & 13 deletions src/generic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,9 @@ evaluate(dist::PreMetric, a, b) = dist(a, b)
Infer the result type of metric `dist` with input type `Ta` and `Tb`, or input
data `a` and `b`.
"""
# result_type(::PreMetric, ::Type, ::Type) = Float64 # fallback in Distances
result_type(dist, a, b) = result_type(dist, _eltype(a), _eltype(b))
result_type(f, a::Type, b::Type) = typeof(f(oneunit(a), oneunit(b))) # don't require `PreMetric` subtyping

result_type(dist, a, b) = result_type(dist, _eltype(a), _eltype(b))
# description of approach:
# (a) for generic iterators, rely on Base.IteratorEltype(a)
_eltype(a) = __eltype(Base.IteratorEltype(a), a)
Expand All @@ -48,6 +47,15 @@ _eltype(T::Type) = T

# Generic column-wise evaluation

"""
colwise!(r::AbstractMatrix, metric::PreMetric, a, b)

Compute distances between corresponding elements of the iterable collections
`a` and `b` according to distance `metric`, and store the result in `r`.

`a` and `b` must have the same number of elements, `r` must be a vector of length
`length(a) == length(b)`.
"""
function colwise!(r::AbstractArray, metric::PreMetric, a, b)
require_one_based_indexing(r)
n = length(a)
Expand Down Expand Up @@ -79,6 +87,18 @@ function colwise!(r::AbstractArray, metric::PreMetric, a::AbstractMatrix, b::Abs
r
end

"""
colwise!(r::AbstractMatrix, metric::PreMetric,
a::AbstractMatrix, b::AbstractMatrix)

Compute distances between each corresponding columns of `a` and `b` according
to distance `metric`, and store the result in `r`. Exactly one of `a` or `b`
can be a vector, in which case the distance between that vector and all columns
of the other matrix are computed.

`a` and `b` must have the same number of columns if neither of the two is a
vector. `r` must be a vector of length `maximum(size(a, 2), size(b, 2))`.
"""
function colwise!(r::AbstractArray, metric::PreMetric, a::AbstractMatrix, b::AbstractMatrix)
require_one_based_indexing(r, a, b)
n = get_common_ncols(a, b)
Expand All @@ -89,16 +109,32 @@ function colwise!(r::AbstractArray, metric::PreMetric, a::AbstractMatrix, b::Abs
r
end

function colwise!(r::AbstractArray, metric::SemiMetric, a::AbstractMatrix, b::AbstractVector)
colwise!(r, metric, b, a)
end
"""
colwise(r::AbstractMatrix, metric::PreMetric, a, b)

Compute distances between corresponding elements of the iterable collections
`a` and `b` according to distance `metric`.

`a` and `b` must have the same number of elements, `r` must be a vector of length
`length(a) == length(b)`.
"""
function colwise(metric::PreMetric, a, b)
n = get_common_length(a, b)
r = Vector{result_type(metric, a, b)}(undef, n)
colwise!(r, metric, a, b)
end

"""
colwise(r::AbstractMatrix, metric::PreMetric,
a::AbstractMatrix, b::AbstractMatrix)

Compute distances between each corresponding columns of `a` and `b` according
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to note is that these methods are inconsistent with the ones treating a and b as iterators of columns: a matrix of vectors will be treated differently from a vector of the same vectors. That's probably OK in practice, but that's one of the reasons why I'd like to move to requiring explicitly writing pairwise(d, eachol(a), eachcol(b)) in the longer term. That way we won't need dims anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be very nice, because it redirects the matrix-based method to the iterator-based method, and one could get rid of the matrix-based ones. The only issue I see is that for the specialized *Euclidean (and a few others) distances, where we do need the underlying matrix for performance reason, I don't seem to be able to unwrap it from the eachcol generator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fortunately JuliaLang/julia#32310 should allow us to retrieve the underlying matrix!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I looked at that one a bit today. Will they let it go into v1.6? I wonder how the ecosystem is going to adapt to v1.6 being the new LTS, and how fast packages will really drop 1.6- support. In many cases, there is no hard reason, only soft ones.

to distance `metric`. Exactly one of `a` or `b` can be a vector, in which case
the distance between that vector and all columns of the other matrix are computed.

`a` and `b` must have the same number of columns if neither of the two is a
vector. `r` must be a vector of length `maximum(size(a, 2), size(b, 2))`.
"""
function colwise(metric::PreMetric, a::AbstractMatrix, b::AbstractMatrix)
n = get_common_ncols(a, b)
r = Vector{result_type(metric, a, b)}(undef, n)
Expand Down Expand Up @@ -200,7 +236,7 @@ in `a` and `b` according to distance `metric`, and store the result in `r`.
If a single matrix `a` is provided, compute distances between its rows or columns.

`a` and `b` must have the same numbers of columns if `dims=1`, or of rows if `dims=2`.
`r` must be a square matrix with size `size(a, dims) == size(b, dims)`.
`r` must be a matrix with size `size(a, dims) × size(b, dims)`.
"""
function pairwise!(r::AbstractMatrix, metric::PreMetric,
a::AbstractMatrix, b::AbstractMatrix;
Expand Down Expand Up @@ -245,8 +281,22 @@ function pairwise!(r::AbstractMatrix, metric::PreMetric, a::AbstractMatrix;
end
end

"""
pairwise!(r::AbstractMatrix, metric::PreMetric, a, b)
pairwise!(r::AbstractMatrix, metric::PreMetric, a)

Compute distances between each element of collection `a` and each element of
collection `b` according to distance `metric`, and store the result in `r`.
If a single iterable `a` is provided, compute distances between its elements.

`r` must be a matrix with size `length(a) × length(b)`.
"""
pairwise!(r::AbstractMatrix, metric::PreMetric, a, b) = _pairwise!(r, metric, a, b)
pairwise!(r::AbstractMatrix, metric::PreMetric, a) = _pairwise!(r, metric, a)

"""
pairwise(metric::PreMetric, a::AbstractMatrix, b::AbstractMatrix=a; dims)
pairwise(metric::PreMetric, a::AbstractMatrix; dims)

Compute distances between each pair of rows (if `dims=1`) or columns (if `dims=2`)
in `a` and `b` according to distance `metric`. If a single matrix `a` is provided,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to note is that these methods are inconsistent with the ones treating a and b as iterators of columns: a matrix of vectors will be treated differently from a vector of the same vectors. That's probably OK in practice, but that's one of the reasons why I'd like to move to requiring explicitly writing pairwise(d, eachol(a), eachcol(b)) in the longer term. That way we won't need dims anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure I understand the inconsistency, actually. Could you please sketch an application case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, pairwise(d, [a, b, c, d]) vs. pairwise(d, reshape([a, b, c, d], 2, 2)) with a, b, c and d vectors of numbers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then I understood you correctly. Out of the two calls you mentioned, only the first one works. The second one fails because it treats a, b, c and d like numbers, but then calls like abs(a) (or whatever necessary) fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But if we were fully consistent, the second call would be equivalent to the first one, since matrices are just one kind of iterator.

Expand All @@ -264,13 +314,6 @@ function pairwise(metric::PreMetric, a::AbstractMatrix, b::AbstractMatrix;
pairwise!(r, metric, a, b, dims=dims)
end

function pairwise(metric::PreMetric, a, b)
m = length(a)
n = length(b)
r = Matrix{result_type(metric, a, b)}(undef, m, n)
_pairwise!(r, metric, a, b)
end

function pairwise(metric::PreMetric, a::AbstractMatrix;
dims::Union{Nothing,Integer}=nothing)
dims = deprecated_dims(dims)
Expand All @@ -280,6 +323,21 @@ function pairwise(metric::PreMetric, a::AbstractMatrix;
pairwise!(r, metric, a, dims=dims)
end

"""
pairwise(metric::PreMetric, a, b)
pairwise(metric::PreMetric, a)

Compute distances between each element of collection `a` and each element of
collection `b` according to distance `metric`. If a single iterable `a` is
provided, compute distances between its elements.
"""
function pairwise(metric::PreMetric, a, b)
m = length(a)
n = length(b)
r = Matrix{result_type(metric, a, b)}(undef, m, n)
_pairwise!(r, metric, a, b)
end

function pairwise(metric::PreMetric, a)
n = length(a)
r = Matrix{result_type(metric, a, a)}(undef, n, n)
Expand Down