Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic routes: Can we optimize the route matching rather than pulling up all routes for an app? #77

Open
treeder opened this issue Aug 29, 2016 · 3 comments
Assignees

Comments

@treeder
Copy link
Contributor

treeder commented Aug 29, 2016

This is tricky because of this: #71

ie: https://github.com/iron-io/functions/pull/74/files#diff-e180f07f5c320b82004c2bb065eedef0R99

The problem is we'd like to query for a single route, but let's say the route is defined as:

/blogs/:blog_id/comments/:comment_id

And a request comes in for:

/blogs/123/comments/456

How do we query the database for the route metadata with only that information? Could query for /blogs/123/comments/456, which is the static default way. But that wouldn't match. We could try all variations, eg /blogs/123/comments/* and /blogs/123/*/456, and /blogs/123/*/*, and so on, but that wouldn't be very efficient.

To summarize, the problem here is:

Given a path and only a path, eg: /blogs/travisblog/comments/123 what should you query for? Not knowing what routes exist before hand.

@treeder treeder mentioned this issue Aug 29, 2016
@henriquechehad henriquechehad self-assigned this Sep 12, 2016
@seiflotfy seiflotfy added this to the Beta - Major milestone Sep 13, 2016
@seiflotfy seiflotfy modified the milestones: Alpha 2, Alpha 1 Oct 14, 2016
@seiflotfy seiflotfy self-assigned this Oct 31, 2016
@ucirello ucirello assigned ucirello and unassigned seiflotfy Nov 15, 2016
ucirello added a commit that referenced this issue Nov 15, 2016
@ucirello ucirello added major and removed minor labels Nov 16, 2016
@treeder treeder removed this from the Alpha 2 milestone Nov 19, 2016
@treeder treeder changed the title Can we optimize the route matching rather than pulling up all routes for an app? Dynamic routes: Can we optimize the route matching rather than pulling up all routes for an app? Nov 19, 2016
@ucirello ucirello removed their assignment Dec 13, 2016
@jmank88
Copy link
Contributor

jmank88 commented Feb 3, 2017

I came up with a scheme for postgres that at least pushes all the work to the server, so all the routes don't come over the wire each time, though I'm still not sure how to index it efficiently.

If the routes are stored with parameter names removed (Example: /blogs/:blog_id/comments/:comment_id -> /blogs/:/comment/:), then fairly simple regex queries can be used by matching on either the whole word or the colon character at each path component. Example: /blogs/123/comments/456 -> ^/(blogs|:)/(123|:)/(comments|:)/(456|:)$

CREATE TABLE routes (
  route TEXT,
  raw_route TEXT
);

-- Register routes
INSERT INTO routes (route, raw_route) VALUES 
('/blogs/:blog_id/comments/:comment_id', '/blogs/:/comments/:'),
('/blogs/:blog_id/comments', '/blogs/:/comments'),
('/blogs/:blog_id','/blogs/:'),
('/blogs','/blogs');

-- Query for '/blogs/123/comments/456'
SELECT route FROM routes WHERE raw_route ~ '^/(blogs|:)/(123|:)/(comments|:)/(456|:)$'

In theory, the same general approach could translate to BoltDB as well - store by nameless route components in nested buckets, and then search for matches one component at a time.

@jmank88 jmank88 mentioned this issue Feb 14, 2017
@jmank88
Copy link
Contributor

jmank88 commented Mar 29, 2017

I implemented a prefix-scanning solution for BoltDB: https://godoc.org/github.com/jmank88/nuts#hdr-Path_Prefix_Scans
It is much more simple and elegant than the nested buckets approach, and the code (just two functions) can be run against the existing schema. I was able to test/bench up to 1,000,000 routes.

@jan-g
Copy link

jan-g commented Jul 31, 2017

If you're happy with an algorithm linear in length of path (however, this has as many DB round-trips) then for /blogs/123/comments/456 you can look up blogs/ (and failing that, :/). Then look for blogs/123 (or blogs/:). Then blogs/:/comments. Then blogs/:/comments/456 (and blogs/:/comments/:` - which returns the final route).

With careful caching of a next-key-set in the trie path you can probably halve the number of round trips that this requires. The cost is updating the trie on route creation and deletion. Other "optimisations" might include caching complete patterns over a particular part of a sub-trie and returning those when there are fewer than some threshold number - ie, short-circuiting the complete trie traversal.

Note, this approach favours early parts of the path being matched over later ones - however, I don't think that that's too high a price to pay in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants