Audit the use of unsafe in uri/path.rs #413

sbosnick · 2020-04-12T00:58:55Z

Reorganize the one function with a use of unsafe (from_shared()) to highlight that it does a scan of each byte that it eventually passes to ByteStr::from_utf8_unchecked(). This makes it apparent (as described in the "Safety" comment) that the input to from_utf8_unchecked() is valid UTF-8 and is, thus, sound.

This is a part of #412.

The test include both valid and invalid conversions including conversions from a [u8] that incudes invalid UTF-8.

The new implementation (which tracks the old one) folds the bytes until the fragment specifier ('#') or the end. It tracks the location of the query specifier ('?') and the length, and returns an error for invalid bytes. This implementation emphasizes that from_shared() scans through the bytes one time.

hawkw

Thanks for doing this! This looks good to me — at a glance, it looks like we still do the same number of iterations in both the previous and new code, so there doesn't appear to be a performance impact.

I'd still like to get @seanmonstar's eyes on this as well, though.

hawkw · 2020-04-17T00:02:46Z

src/uri/path.rs

+        assert_eq!("a?b", pq_bytes(&[b'a', b'?', b'b']));
+        assert_eq!("a?{b}", pq_bytes(&[b'a', b'?', b'{', b'b', b'}']));
+        assert_eq!("a", pq_bytes(&[b'a', b'#', b'b']));
+        assert_eq!("a?b", pq_bytes(&[b'a', b'?', b'b', b'#', b'c']));


May eventually be worth having property tests in addition, but definitely not a blocker.

I would be happy to add property tests but my thoughts are that that should be a separate pull request. If this is a good approach I could add an issue for this as a marker and come back to it later.

src/uri/path.rs

Make the wording of the "Safety" comment clearer.

sbosnick · 2020-04-17T00:58:14Z

If there are any concerns about a possible performance impact I would be happy to add benchmarks to answer that question.

seanmonstar · 2020-05-13T23:19:30Z

src/uri/path.rs

+        let (query, len) = src.as_ref().iter()
+            // stop at the fragment specifier
+            .take_while(|&&c| c != b'#')
+            .try_fold((None, 0u16), |(query, i), &c| {


Does passing these values along due to fold optimize differently? Do we even have benchmarks parsing URIs?

Also, this isn't a super strong feeling, but I kind of feel that the previous 2 loops is a little clearer that we're looking for two things....

On the second point, the two loop version emphasizes that we are looking for two things, the one loop version emphasizes that we are looking at every byte in src until the first '#'. The latter is what we need to confirm the soundness of the call the ByteStr::from_utf8_unchecked() later in the function. (This is just a shift in emphasis; the two loop version checks the same things.) It is possible to describe the 2 loop version with comments describing the loop control flow (how the early exits and the normal end-of-loop combine to ensure that every byte until the first '#' gets checked) but the one loop version accomplishes this in code, rather than comments. This is a trade-off on the code-quality issue.

Unfortunately, though, there is a performance regression. (When I checked quickly in the past it didn't look like there was but I checked it more closely after this comment and there is.) There are a few benchmarks for parsing URI's in benches/uri.rs. They don't benchmark all of the URI parsing code (for example they don't cover parsing the scheme or authority) but two of the benchmarks cover the path and query parsing code. These benchmarks show a performance regression in this code over that in the master branch.

I will try a version of the one loop approach with external iteration instead of try_fold to see if that addresses the performance issue.

The latest commit has resolved the performance regression (to within the margin of error).

In PathAndQuery::from_shared(), switched from internal iteration using try_fold() to external iteration. The logic for checking each byte remains the same. This corrected the performance regression compared with the master branch.

sbosnick · 2020-05-15T12:41:09Z

The latest commit removes the performance regression that the benchmarks in benches/uri.rs showed (to within the margin of error). This retains the one-loop version of the core parsing code but changed from internal iteration (using try_fold()) to external iteration.

It appears that the outstanding issue then is whether this code should use a two-loop version that emphasizes parsing for two separate things (the path and the query) of a one-loop version that emphasized checking each byte in turn that will eventually be passed to ByteStr::from_utf8_unchecked().

I favour the latter because it makes the soundness easier to see. I can convert this back to a two-loop version, though, if there is a reason to prefer that approach.

seanmonstar · 2020-05-15T19:03:32Z

Thanks! I tried out the benchmarks with the newest changes (commit
d4718c9 ), here's what I see:

Master:

test uri_parse_relative_medium ... bench:         110 ns/iter (+/- 0) = 545 MB/s
test uri_parse_relative_query  ... bench:         134 ns/iter (+/- 1) = 626 MB/s
test uri_parse_slash           ... bench:          40 ns/iter (+/- 1) = 25 MB/s

d4718c9:

test uri_parse_relative_medium ... bench:         132 ns/iter (+/- 1) = 454 MB/s
test uri_parse_relative_query  ... bench:         169 ns/iter (+/- 1) = 497 MB/s
test uri_parse_slash           ... bench:          41 ns/iter (+/- 0) = 24 MB/s

This mostly eliminates the performance regressions from the refactoring, but does so by reintoducing the two separate loops: one for parsing the path and one for parsing the query. How this version differs form the orginal two-loop version is that it eliminates the early exits from the two loops for matching the the fragment specifier ('#'). It instead encodes this into a take_while() combinator. This makes it more ovious that the two loops are are checking each byte which is relied on on the "Safety" justification.

sbosnick · 2020-05-18T17:39:14Z

The latest commit improves the performance of this branch to just shy of master. The results that I see on the uri benchmarks are as follows:

Master:

test uri_parse_relative_medium ... bench:         120 ns/iter (+/- 0) = 500 MB/s
test uri_parse_relative_query  ... bench:         170 ns/iter (+/- 0) = 494 MB/s
test uri_parse_slash           ... bench:          30 ns/iter (+/- 0) = 33 MB/s

27ff0cb:

test uri_parse_relative_medium ... bench:         126 ns/iter (+/- 1) = 476 MB/s
test uri_parse_relative_query  ... bench:         176 ns/iter (+/- 0) = 477 MB/s
test uri_parse_slash           ... bench:          30 ns/iter (+/- 0) = 33 MB/s

So the two benchmarks that are affected by these changes are 6 ns/iter slower as compared with master.

This change returns to a two-loop version (that optimizes better than the different variations I tried of the one-loop version) but it uses take_while() to end checking the bytes at the fragment specifier instead of having an early exit in each of the two loops. This makes it more obvious that between the two loops they check every byte up to the point where the later code truncates src (this is embedded in the code itself instead of having to be described in comments).

I don't know how crucial the uri parsing code is on the fast path of the crates that rely on http. I suggest, though, that if 6 ns/iter is an important difference then we need more (and better) benchmarks of the uri parsing (which I am prepared to write).

sbosnick added 4 commits April 11, 2020 09:44

Add tests for [u8] conversions to PathAndQuery

50f5d4f

The test include both valid and invalid conversions including conversions from a [u8] that incudes invalid UTF-8.

Add two more test case for PathAndQuery

375b95c

Document the PathAndQuery's sound use of unsafe

2a8fe92

hawkw approved these changes Apr 17, 2020

View reviewed changes

Revise comment in PathAndQuery::from_shared

a85a03c

Make the wording of the "Safety" comment clearer.

sbosnick mentioned this pull request Apr 26, 2020

Document and test the use of unsafe #412

Open

9 tasks

seanmonstar reviewed May 13, 2020

View reviewed changes

Improve performance in uri::path

d4718c9

In PathAndQuery::from_shared(), switched from internal iteration using try_fold() to external iteration. The logic for checking each byte remains the same. This corrected the performance regression compared with the master branch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit the use of unsafe in uri/path.rs #413

Audit the use of unsafe in uri/path.rs #413

sbosnick commented Apr 12, 2020

hawkw left a comment

hawkw Apr 17, 2020

sbosnick Apr 17, 2020

sbosnick commented Apr 17, 2020

seanmonstar May 13, 2020

sbosnick May 14, 2020

sbosnick May 15, 2020

sbosnick commented May 15, 2020

seanmonstar commented May 15, 2020 •

edited

Loading

sbosnick commented May 18, 2020

Audit the use of unsafe in uri/path.rs #413

Are you sure you want to change the base?

Audit the use of unsafe in uri/path.rs #413

Conversation

sbosnick commented Apr 12, 2020

hawkw left a comment

Choose a reason for hiding this comment

hawkw Apr 17, 2020

Choose a reason for hiding this comment

sbosnick Apr 17, 2020

Choose a reason for hiding this comment

sbosnick commented Apr 17, 2020

seanmonstar May 13, 2020

Choose a reason for hiding this comment

sbosnick May 14, 2020

Choose a reason for hiding this comment

sbosnick May 15, 2020

Choose a reason for hiding this comment

sbosnick commented May 15, 2020

seanmonstar commented May 15, 2020 • edited Loading

sbosnick commented May 18, 2020

seanmonstar commented May 15, 2020 •

edited

Loading