Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit the use of unsafe in uri/path.rs #413

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

sbosnick
Copy link
Contributor

Reorganize the one function with a use of unsafe (from_shared()) to highlight that it does a scan of each byte that it eventually passes to ByteStr::from_utf8_unchecked(). This makes it apparent (as described in the "Safety" comment) that the input to from_utf8_unchecked() is valid UTF-8 and is, thus, sound.

This is a part of #412.

The test include both valid and invalid conversions including
conversions from a [u8] that incudes invalid UTF-8.
The new implementation (which tracks the old one) folds the bytes until
the fragment specifier ('#') or the end. It tracks the location of the
query specifier ('?') and the length, and returns an error for invalid
bytes. This implementation emphasizes that from_shared() scans through
the bytes one time.
Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! This looks good to me — at a glance, it looks like we still do the same number of iterations in both the previous and new code, so there doesn't appear to be a performance impact.

I'd still like to get @seanmonstar's eyes on this as well, though.

assert_eq!("a?b", pq_bytes(&[b'a', b'?', b'b']));
assert_eq!("a?{b}", pq_bytes(&[b'a', b'?', b'{', b'b', b'}']));
assert_eq!("a", pq_bytes(&[b'a', b'#', b'b']));
assert_eq!("a?b", pq_bytes(&[b'a', b'?', b'b', b'#', b'c']));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May eventually be worth having property tests in addition, but definitely not a blocker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be happy to add property tests but my thoughts are that that should be a separate pull request. If this is a good approach I could add an issue for this as a marker and come back to it later.

src/uri/path.rs Outdated Show resolved Hide resolved
Make the wording of the "Safety" comment clearer.
@sbosnick
Copy link
Contributor Author

If there are any concerns about a possible performance impact I would be happy to add benchmarks to answer that question.

src/uri/path.rs Outdated
let (query, len) = src.as_ref().iter()
// stop at the fragment specifier
.take_while(|&&c| c != b'#')
.try_fold((None, 0u16), |(query, i), &c| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does passing these values along due to fold optimize differently? Do we even have benchmarks parsing URIs?

Also, this isn't a super strong feeling, but I kind of feel that the previous 2 loops is a little clearer that we're looking for two things....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the second point, the two loop version emphasizes that we are looking for two things, the one loop version emphasizes that we are looking at every byte in src until the first '#'. The latter is what we need to confirm the soundness of the call the ByteStr::from_utf8_unchecked() later in the function. (This is just a shift in emphasis; the two loop version checks the same things.) It is possible to describe the 2 loop version with comments describing the loop control flow (how the early exits and the normal end-of-loop combine to ensure that every byte until the first '#' gets checked) but the one loop version accomplishes this in code, rather than comments. This is a trade-off on the code-quality issue.

Unfortunately, though, there is a performance regression. (When I checked quickly in the past it didn't look like there was but I checked it more closely after this comment and there is.) There are a few benchmarks for parsing URI's in benches/uri.rs. They don't benchmark all of the URI parsing code (for example they don't cover parsing the scheme or authority) but two of the benchmarks cover the path and query parsing code. These benchmarks show a performance regression in this code over that in the master branch.

I will try a version of the one loop approach with external iteration instead of try_fold to see if that addresses the performance issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest commit has resolved the performance regression (to within the margin of error).

In PathAndQuery::from_shared(), switched from internal iteration using
try_fold() to external iteration. The logic for checking each byte
remains the same. This corrected the performance regression compared
with the master branch.
@sbosnick
Copy link
Contributor Author

The latest commit removes the performance regression that the benchmarks in benches/uri.rs showed (to within the margin of error). This retains the one-loop version of the core parsing code but changed from internal iteration (using try_fold()) to external iteration.

It appears that the outstanding issue then is whether this code should use a two-loop version that emphasizes parsing for two separate things (the path and the query) of a one-loop version that emphasized checking each byte in turn that will eventually be passed to ByteStr::from_utf8_unchecked().

I favour the latter because it makes the soundness easier to see. I can convert this back to a two-loop version, though, if there is a reason to prefer that approach.

@seanmonstar
Copy link
Member

seanmonstar commented May 15, 2020

Thanks! I tried out the benchmarks with the newest changes (commit
d4718c9 ), here's what I see:

Master:

test uri_parse_relative_medium ... bench:         110 ns/iter (+/- 0) = 545 MB/s
test uri_parse_relative_query  ... bench:         134 ns/iter (+/- 1) = 626 MB/s
test uri_parse_slash           ... bench:          40 ns/iter (+/- 1) = 25 MB/s

d4718c9:

test uri_parse_relative_medium ... bench:         132 ns/iter (+/- 1) = 454 MB/s
test uri_parse_relative_query  ... bench:         169 ns/iter (+/- 1) = 497 MB/s
test uri_parse_slash           ... bench:          41 ns/iter (+/- 0) = 24 MB/s

This mostly eliminates the performance regressions from the refactoring,
but does so by reintoducing the two separate loops: one for parsing the
path and one for parsing the query.

How this version differs form the orginal two-loop version is that it
eliminates the early exits from the two loops for matching the the
fragment specifier ('#'). It instead encodes this into a take_while()
combinator. This makes it more ovious that the two loops are are
checking each byte which is relied on on the "Safety" justification.
@sbosnick
Copy link
Contributor Author

The latest commit improves the performance of this branch to just shy of master. The results that I see on the uri benchmarks are as follows:

Master:

test uri_parse_relative_medium ... bench:         120 ns/iter (+/- 0) = 500 MB/s
test uri_parse_relative_query  ... bench:         170 ns/iter (+/- 0) = 494 MB/s
test uri_parse_slash           ... bench:          30 ns/iter (+/- 0) = 33 MB/s

27ff0cb:

test uri_parse_relative_medium ... bench:         126 ns/iter (+/- 1) = 476 MB/s
test uri_parse_relative_query  ... bench:         176 ns/iter (+/- 0) = 477 MB/s
test uri_parse_slash           ... bench:          30 ns/iter (+/- 0) = 33 MB/s

So the two benchmarks that are affected by these changes are 6 ns/iter slower as compared with master.

This change returns to a two-loop version (that optimizes better than the different variations I tried of the one-loop version) but it uses take_while() to end checking the bytes at the fragment specifier instead of having an early exit in each of the two loops. This makes it more obvious that between the two loops they check every byte up to the point where the later code truncates src (this is embedded in the code itself instead of having to be described in comments).

I don't know how crucial the uri parsing code is on the fast path of the crates that rely on http. I suggest, though, that if 6 ns/iter is an important difference then we need more (and better) benchmarks of the uri parsing (which I am prepared to write).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants