-
Notifications
You must be signed in to change notification settings - Fork 678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: implement valid AccountId generation from arbitrary data #9107
Conversation
5652ce6
to
defbe94
Compare
@Ekleog-NEAR: can you review this please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I just sent a few suggestions down there :)
(Also, it’d be nice to have a proper cargo feature flag for Arbitrary
rather than rely on cfg(fuzzing)
, but it’s orthogonal to this PR)
fn arbitrary_take_rest(u: arbitrary::Unstructured<'a>) -> arbitrary::Result<Self> { | ||
<&str as arbitrary::Arbitrary>::arbitrary_take_rest(u)? | ||
.parse() | ||
.map_err(|_| arbitrary::Error::IncorrectFormat) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT this has a lower success rate than arbitrary
. So the default implementation would probably be better than this one, and this function can just be removed.
(arbitrary_take_rest
is mostly used for generating arbitrary collections, where the generator knows it can take all that remains. Alternatively it’d also be possible to copy-paste the contents of the arbitrary
function and replace the arbitrary::<&str>
with an arbitrary_take_rest::<&str>
here, but I’m not sure it’s worth the effort)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Figured the implementation of <&str as Arbitrary>
::arbitrary_take_rest
would also suffice for our AccountId
.
Any non-Ok call to the method for &str
's should also be unsuccessful when generating AccountId
's.
I believe they'd have used that implementation for a reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re right that str::arbitrary_take_rest
currently does the same thing as your PR; I’ve just submitted rust-fuzz/arbitrary#151, let’s see what they say :) (either way, this is certainly not a blocker, the current behavior is already much better than the old one that just generated invalid AccountId’s)
We presently do, at least until #9106 merges. I'm replacing it with the cfg attr because it is present when fuzzing. But it forces us to bump our MSRV to Nothing should break here. |
e8cdc87
to
18223ac
Compare
dcc799b
to
8d93a45
Compare
Err(e) => { | ||
if let Some((valid_up_to, _)) = e.char { | ||
let valid = &s[..valid_up_to]; | ||
debug_assert!(AccountId::validate(valid).is_ok()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miraclx I feel this is an invalid assumption. Maybe I miss something, but wouldn’t a|xyz
fail validation and this code then try to construct account id a
as valid? (It is too short)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch there. Although, it won't actually construct an invalid account ID, due to the validated assertion. It would panic, and we don't want that either. I'll follow this up with a patch. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably should be a fix of AccountId::validate
(probably coupled with a major release as it’s technically a breaking change, even though it’s actually a bugfix):
ParseErrorKind::TooLong
should return a char up-to-which it’s validParseErrorKind::{InvalidChar,RedundantSeparator}
should returnNone
as a char if it’s actually not possible to make a valid account-id from a substring of the provided string
Does that make sense? (I had assumed that was the way validate
worked during the review, so if we were two being tricked by this it’s probably worth fixing the root cause, unless there’s a specific reason for the current behavior of validate
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ParseErrorKind::TooLong
should return a char up-to-which it’s valid
I'd argue there's nothing to gain from this information that AccountId::MAX_LEN
doesn't already provide.
ParseErrorKind::{InvalidChar,RedundantSeparator}
should returnNone
as a char if it’s actually not possible to make a valid account-id from a substring of the provided string
I'm not convinced this helps either. The point of validate
is to test if the complete input string slice adheres to all the rules and is safe to be represented as a NEAR AccountId
. The feedback it provides is to aid eventual modification of the input by the user, and not necessarily dynamically by the program.
validate
returning InvalidChar
in frol's example only tells you that the second character is invalid. It doesn't provide any guarantees that what has been validated so far (when taken in isolation) will construct a valid ID. Validation is only complete after all the input has been consumed.
To do what you're suggesting, we'll have to do backtracked validation on error detection. And the only utility to that is to slice the input, (like we're doing with this arbitrary implementation). Which is only really useful if the input was computer-generated. AccountId
inputs are often sourced from humans.
For example, in a_-b
, a_
is perfectly valid until we detect that the next character is -
. We can only return information about the current invalid character, as opposed to running a backtracked validation of the a_
as well.
We can move this outside though, and do it directly in our arbitrary implementation, in which case, all we have to do is run recursive validation on slices of the input (provided indexes from the validator). Until either the index reaches 0
or the validation returns an Ok
.
✨: Even a slice of a valid AccountId
isn't a valid AccountId
without further validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue there's nothing to gain from this information that AccountId::MAX_LEN doesn't already provide
If eg. the character at MAX_LEN is a .
, then the char returned would be the one before that.
I'm not convinced this helps either. The point of validate is to test if the complete input string slice adheres to all the rules and is safe to be represented as a NEAR AccountId. The feedback it provides is to aid eventual modification of the input by the user, and not necessarily dynamically by the program.
I guess the question raised here is what do you want the API of validate
to actually be. Good news: I just found out that char
is a private API anyway, so it’s not a big deal, and whichever option we choose here (changing validate
or changing the arbitrary
implementation), it’s not going to have any impact in practice.
I guess my takeaway from this discussion would be, we should rename ParseAccountError::char
into something like error_on
, so that it’s obvious that it doesn’t indicate the location until which the provided account-id was valid. WDYT?
Fixes unintended panic in the `Arbitrary` implementation of `AccountId`. As pointed out in #9107 (comment), certain `Unstructured` data sources like `a|bcd` or `a_-b` causes a panic with the current implementation. This patch reduces the consumed input to the limit of validity adherence. Failing otherwise. But never panics. Worth noting this implementation does not provide a fallback for input sources shorter than `AccountId::MIN_LEN` or longer than `AccountId::MAX_LEN`.
Looks like the former implementation of `Arbitrary` for `AccountId` would assume all strings were well-formed `AccountId`'s. Skipping the validation logic. Which is untrue. This patch swaps out the derive for a custom implementation that ensures all arbitrary byte slices are validated before an `AccountId` is constructed.
Fixes unintended panic in the `Arbitrary` implementation of `AccountId`. As pointed out in #9107 (comment), certain `Unstructured` data sources like `a|bcd` or `a_-b` causes a panic with the current implementation. This patch reduces the consumed input to the limit of validity adherence. Failing otherwise. But never panics. Worth noting this implementation does not provide a fallback for input sources shorter than `AccountId::MIN_LEN` or longer than `AccountId::MAX_LEN`.
Looks like the former implementation of `Arbitrary` for `AccountId` would assume all strings were well-formed `AccountId`'s. Skipping the validation logic. Which is untrue. This patch swaps out the derive for a custom implementation that ensures all arbitrary byte slices are validated before an `AccountId` is constructed.
Looks like the former implementation of
Arbitrary
forAccountId
would assume all strings were well-formedAccountId
's. Skipping the validation logic. Which is untrue.This patch swaps out the derive for a custom implementation that ensures all arbitrary byte slices are validated before an
AccountId
is constructed.