Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

d3.join? #52

Open
mbostock opened this issue Mar 15, 2017 · 6 comments
Open

d3.join? #52

mbostock opened this issue Mar 15, 2017 · 6 comments

Comments

@mbostock
Copy link
Member

mbostock commented Mar 15, 2017

Imagine you’re joining a TSV file to a GeoJSON feature collection. A typical way of doing that might be to create a Map and then use array.forEach:

var map = new Map(rates.map(d => [d.id, +d.rate]));
collection.features.forEach(f => f.properties.rate = map.get(f.id));

It’d be neat if there was a simple way to join two arrays of objects and invoke a function for each joined row.

Option 1:

d3.join(collection.features, rates, (a, b) => a.properties.rate = +b.rate);

This doesn’t really work because it would assume that d => d.id is always the key function, and in practice you’d want to be able to specify key functions for both the left and the right arrays. I suppose you could require calling array.map on your arrays before passing them to d3.join, but that makes it increasingly less useful than just using a Map as above.

I think we should avoid too many unnamed arguments to a single function especially with optionals, so the following Option 2 probably isn’t a good idea:

d3.join(collection.features, a => a.id, rates, b => b.id, (a, b) => a.properties.rate = +b.rate);

A verbose option 3, a bit like d3.nest:

d3.join()
    .leftKey(a => a.id)
    .rightKey(b => b.id)
    .reduce((a, b) => a.properties.rate = +b.rate)
    (rates, collection.features);

An enhancement of option 3 with a convenience for setting the left and right key to the same function:

d3.join()
    .key(d => d.id)
    .reduce((a, b) => a.properties.rate = +b.rate)
    (rates, collection.features);

But what would join.key with no arguments return?

A further or alternative enhancement of option 3 to specify the left and right key to the constructor:

d3.join(d => d.id)
    .reduce((a, b) => a.properties.rate = +b.rate)
    (rates, collection.features);

Slightly icky problem here is the default case. Unlike d3.nest, there’s a reasonable default join, but to use it requires extra parens:

d3.join()(rates, collection.features);

Option 4 is immutable closures like d3-interpolate’s interpolate.gamma. These are nice because then you don’t need extra parens in the default case:

d3.join(rates, collection.features);

With a custom reducer:

d3.join.reduce((a, b) => a.properties.rate = +b.rate)(collection.features, rates);

With a custom key and reducer (everything is named!):

d3.join
    .key(d => d.id)
    .reduce((a, b) => a.properties.rate = +b.rate)
    (collection.features, rates)

With this approach join.key can easily take two functions if you wanted separate keys for left and right. (You could have separate join.leftKey and join.rightKey, but I don’t think it’s necessary.) You can’t call join.key as an accessor as you can in option 3 so there’s no issue with what sort of return value makes sense—it always constructs a new join operator.

Also there’s the question of what join(A, B) should return. Nothing? Maybe an array of results returned by the reducer, similar to d3.cross? With the same default reducer of (a, b) => [a, b]?

@mbostock mbostock changed the title d3.join( d3.join? Mar 15, 2017
@timelyportfolio
Copy link

Just in case it helps think through, here is lodash-joins. From the R side, dplyr joins.

@Firescar96
Copy link

Firescar96 commented Nov 30, 2017

sure I'll give it a try, going to use sorted merge joins for functionality like

d3.join()
    .key(d => d.id)
    .reduce((a, b) => a.properties.rate = +b.rate)
    (dataset1, dataset2)

I'm not sure what you meant @mbostock by the case join.key with no arguments.

EDIT: Ah, after looking at other d3 modules as a guide for how to implement this I see what you mean, I'll use this syntax instead:

let joinOp = d3.join(dataset1, dataset2)
    .key(d => d.id)
    .reduce((a, b) => a.properties.rate = +b.rate)

console.log(joinOp.apply())

@Firescar96
Copy link

https://github.com/Firescar96/d3-array/tree/master/src/join
https://github.com/Firescar96/d3-array/blob/master/test/join-test.js

I haven't written docs for it yet, but the tests should be helpful for how it works til I do. I went with the cartesian product via nested loop join as the default. There's also a sort merge join option, or a person can write their own.

@Firescar96
Copy link

Benchmarks against lodash-joins, using their tests
screenshot-20171206-16 54 01

@Fil
Copy link
Member

Fil commented Jun 24, 2020

My feeling after reading this several times over the last month is that it appears too complex (for me) to memorize the arguments and the way this would work. I'd fear not really seeing what's happening, and I'd personally rather create an index first for the key then join "manually". Detailed examples using ES6 and d3.group, d3.rollup or d3.index (#136) would suit me better than the "magic" of d3.join. Also a risk of (mental) collision with selection.join.

@curran
Copy link

curran commented Jun 24, 2020

Would we ever want async join? Each record, for example, making an API request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants