fix: implement valid AccountId generation from arbitrary data #9107

miraclx · 2023-05-25T00:58:10Z

Looks like the former implementation of Arbitrary for AccountId would assume all strings were well-formed AccountId's. Skipping the validation logic. Which is untrue.

This patch swaps out the derive for a custom implementation that ensures all arbitrary byte slices are validated before an AccountId is constructed.

akhi3030 · 2023-05-25T08:33:52Z

@Ekleog-NEAR: can you review this please?

Ekleog-NEAR

Thank you! I just sent a few suggestions down there :)

(Also, it’d be nice to have a proper cargo feature flag for Arbitrary rather than rely on cfg(fuzzing), but it’s orthogonal to this PR)

Ekleog-NEAR · 2023-05-25T15:46:44Z

core/account-id/src/lib.rs

+    fn arbitrary_take_rest(u: arbitrary::Unstructured<'a>) -> arbitrary::Result<Self> {
+        <&str as arbitrary::Arbitrary>::arbitrary_take_rest(u)?
+            .parse()
+            .map_err(|_| arbitrary::Error::IncorrectFormat)
+    }


AFAICT this has a lower success rate than arbitrary. So the default implementation would probably be better than this one, and this function can just be removed.

(arbitrary_take_rest is mostly used for generating arbitrary collections, where the generator knows it can take all that remains. Alternatively it’d also be possible to copy-paste the contents of the arbitrary function and replace the arbitrary::<&str> with an arbitrary_take_rest::<&str> here, but I’m not sure it’s worth the effort)

Figured the implementation of <&str as Arbitrary>::arbitrary_take_rest would also suffice for our AccountId.

Any non-Ok call to the method for &str's should also be unsuccessful when generating AccountId's.

I believe they'd have used that implementation for a reason.

You’re right that str::arbitrary_take_rest currently does the same thing as your PR; I’ve just submitted rust-fuzz/arbitrary#151, let’s see what they say :) (either way, this is certainly not a blocker, the current behavior is already much better than the old one that just generated invalid AccountId’s)

core/account-id/src/lib.rs

miraclx · 2023-05-25T16:28:03Z

Also, it’d be nice to have a proper cargo feature flag for Arbitrary rather than rely on cfg(fuzzing), but it’s orthogonal to this PR

We presently do, at least until #9106 merges. I'm replacing it with the cfg attr because it is present when fuzzing. But it forces us to bump our MSRV to 1.63 even though it's conditional and near-account-id compiles just fine at 1.61.

Nothing should break here.

frol · 2023-05-27T06:56:43Z

core/account-id/src/lib.rs

+            Err(e) => {
+                if let Some((valid_up_to, _)) = e.char {
+                    let valid = &s[..valid_up_to];
+                    debug_assert!(AccountId::validate(valid).is_ok());


@miraclx I feel this is an invalid assumption. Maybe I miss something, but wouldn’t a|xyz fail validation and this code then try to construct account id a as valid? (It is too short)

Nice catch there. Although, it won't actually construct an invalid account ID, due to the validated assertion. It would panic, and we don't want that either. I'll follow this up with a patch. Thank you!

This probably should be a fix of AccountId::validate (probably coupled with a major release as it’s technically a breaking change, even though it’s actually a bugfix):

ParseErrorKind::TooLong should return a char up-to-which it’s valid

ParseErrorKind::{InvalidChar,RedundantSeparator} should return None as a char if it’s actually not possible to make a valid account-id from a substring of the provided string

Does that make sense? (I had assumed that was the way validate worked during the review, so if we were two being tricked by this it’s probably worth fixing the root cause, unless there’s a specific reason for the current behavior of validate)

ParseErrorKind::TooLong should return a char up-to-which it’s valid

I'd argue there's nothing to gain from this information that AccountId::MAX_LEN doesn't already provide.

ParseErrorKind::{InvalidChar,RedundantSeparator} should return None as a char if it’s actually not possible to make a valid account-id from a substring of the provided string

I'm not convinced this helps either. The point of validate is to test if the complete input string slice adheres to all the rules and is safe to be represented as a NEAR AccountId. The feedback it provides is to aid eventual modification of the input by the user, and not necessarily dynamically by the program.

validate returning InvalidChar in frol's example only tells you that the second character is invalid. It doesn't provide any guarantees that what has been validated so far (when taken in isolation) will construct a valid ID. Validation is only complete after all the input has been consumed.

To do what you're suggesting, we'll have to do backtracked validation on error detection. And the only utility to that is to slice the input, (like we're doing with this arbitrary implementation). Which is only really useful if the input was computer-generated. AccountId inputs are often sourced from humans.

For example, in a_-b, a_ is perfectly valid until we detect that the next character is -. We can only return information about the current invalid character, as opposed to running a backtracked validation of the a_ as well.

We can move this outside though, and do it directly in our arbitrary implementation, in which case, all we have to do is run recursive validation on slices of the input (provided indexes from the validator). Until either the index reaches 0 or the validation returns an Ok.

✨: Even a slice of a valid AccountId isn't a valid AccountId without further validation.

I'd argue there's nothing to gain from this information that AccountId::MAX_LEN doesn't already provide

If eg. the character at MAX_LEN is a ., then the char returned would be the one before that.

I'm not convinced this helps either. The point of validate is to test if the complete input string slice adheres to all the rules and is safe to be represented as a NEAR AccountId. The feedback it provides is to aid eventual modification of the input by the user, and not necessarily dynamically by the program.

I guess the question raised here is what do you want the API of validate to actually be. Good news: I just found out that char is a private API anyway, so it’s not a big deal, and whichever option we choose here (changing validate or changing the arbitrary implementation), it’s not going to have any impact in practice.

I guess my takeaway from this discussion would be, we should rename ParseAccountError::char into something like error_on, so that it’s obvious that it doesn’t indicate the location until which the provided account-id was valid. WDYT?

Fixes unintended panic in the `Arbitrary` implementation of `AccountId`. As pointed out in #9107 (comment), certain `Unstructured` data sources like `a|bcd` or `a_-b` causes a panic with the current implementation. This patch reduces the consumed input to the limit of validity adherence. Failing otherwise. But never panics. Worth noting this implementation does not provide a fallback for input sources shorter than `AccountId::MIN_LEN` or longer than `AccountId::MAX_LEN`.

Looks like the former implementation of `Arbitrary` for `AccountId` would assume all strings were well-formed `AccountId`'s. Skipping the validation logic. Which is untrue. This patch swaps out the derive for a custom implementation that ensures all arbitrary byte slices are validated before an `AccountId` is constructed.

Fixes unintended panic in the `Arbitrary` implementation of `AccountId`. As pointed out in #9107 (comment), certain `Unstructured` data sources like `a|bcd` or `a_-b` causes a panic with the current implementation. This patch reduces the consumed input to the limit of validity adherence. Failing otherwise. But never panics. Worth noting this implementation does not provide a fallback for input sources shorter than `AccountId::MIN_LEN` or longer than `AccountId::MAX_LEN`.

Looks like the former implementation of `Arbitrary` for `AccountId` would assume all strings were well-formed `AccountId`'s. Skipping the validation logic. Which is untrue. This patch swaps out the derive for a custom implementation that ensures all arbitrary byte slices are validated before an `AccountId` is constructed.

miraclx requested a review from a team as a code owner May 25, 2023 00:58

miraclx requested a review from akhi3030 May 25, 2023 00:58

miraclx changed the base branch from master to miraclx/repin-account-id May 25, 2023 00:58

miraclx force-pushed the miraclx/fix-arbitrary-account-id branch from 5652ce6 to defbe94 Compare May 25, 2023 01:00

miraclx mentioned this pull request May 25, 2023

chore: bump crates version to 0.17.0 #9108

Merged

akhi3030 requested a review from Ekleog-NEAR May 25, 2023 08:33

Ekleog-NEAR approved these changes May 25, 2023

View reviewed changes

miraclx force-pushed the miraclx/fix-arbitrary-account-id branch from e8cdc87 to 18223ac Compare May 25, 2023 16:30

Base automatically changed from miraclx/repin-account-id to master May 26, 2023 14:53

miraclx added 6 commits May 26, 2023 16:01

fix: implement valid AccountId generation from arbitrary data

a9d9ad4

clippy

412e636

save on unnecessary allocation

d7da07d

reuse str's arbitrary_take_rest

704ca3e

drop unnecessary map

9b88f13

no early return

8d93a45

miraclx force-pushed the miraclx/fix-arbitrary-account-id branch from dcc799b to 8d93a45 Compare May 26, 2023 15:02

miraclx added the S-automerge label May 26, 2023

near-bulldozer bot merged commit a782a6a into master May 26, 2023

near-bulldozer bot deleted the miraclx/fix-arbitrary-account-id branch May 26, 2023 15:19

frol reviewed May 27, 2023

View reviewed changes

miraclx mentioned this pull request May 29, 2023

chore: fix panic on arbitrary AccountId generation #9124

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: implement valid AccountId generation from arbitrary data #9107

fix: implement valid AccountId generation from arbitrary data #9107

miraclx commented May 25, 2023

akhi3030 commented May 25, 2023

Ekleog-NEAR left a comment

Ekleog-NEAR May 25, 2023

miraclx May 25, 2023

Ekleog-NEAR May 26, 2023

miraclx commented May 25, 2023

frol May 27, 2023

miraclx May 29, 2023

Ekleog-NEAR May 29, 2023

miraclx May 29, 2023

Ekleog-NEAR May 31, 2023

fix: implement valid AccountId generation from arbitrary data #9107

fix: implement valid AccountId generation from arbitrary data #9107

Conversation

miraclx commented May 25, 2023

akhi3030 commented May 25, 2023

Ekleog-NEAR left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miraclx commented May 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment