-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use more specific panic message for &str slicing errors #38066
Conversation
r? @sfackler (rust_highfive has picked a reviewer for you, use r? to override) |
c29b730
to
46a26f2
Compare
Should we show all the problems instead of just the first one? |
let ellipsis = if truncated { "[...]" } else { "" }; | ||
|
||
// 1. out of bounds | ||
let index = if !s.is_char_boundary(begin) { begin } else { end }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing to me. It seems like you choose which index is going to be in the message before checking which one is out of bounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_char_boundary
is false if the index is out of bounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, it should be confusing, because it's not really right.
while !s.is_char_boundary(char_start) { | ||
char_start -= 1; | ||
} | ||
let ch = s[char_start..].chars().next().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can panic if the string is empty or if char_start
is at the end. I think I've convinced myself this can't happen, but at least a comment would help, or just to be defensive
if let Some(ch) = s[char_start..].chars().next() {
let char_range = char_start .. char_start + ch.len_utf8();
panic!("index {} is not a char boundary; it is inside {:?} (bytes {:?}) of `{}`{}",
index, ch, char_range, s_trunc, ellipsis);
} else {
panic!("index {} is not a char boundary in `{}`{}", index, s_trunc, ellipsis);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
char_start is a character boundary, and then using it for slicing is ok. It could not be larger than len, that case has already been handled. It could not be equal to len, then it would not be an error in the first place.
while !s.is_char_boundary(char_start) { | ||
char_start -= 1; | ||
} | ||
let ch = s[char_start..].chars().next().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if the problematic index is beyond MAX_DISPLAY_LENGTH
, the preview isn't going to be very useful. It could instead show the region of the string from max(0, index - MAX_DISPLAY_LENGTH/2) .. min(s.len(), index + MAX_DISPLAY_LENGTH/2)
with ellipses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is unnecessary, it just wants to identify the string somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Handling the different cases with custom messages makes this much better.
As mentioned in #38052 I'd like to try and convey that indexing a string (slice) is done using byte offsets -- and not character offsets. I think this is one of those messages people will see this in failing tests and then have a hard time recognizing why this went wrong; and then have an even harder time to find a solution like the unicode-segmentation crate. UTF-8 is hard when people don't think about it.
I know putting all that in the panic message is not a solution; but maybe you can change it to "slicing at byte 4 which is not a char boundary" or something like that?
(By the way, is there a lint to catch statically known wrong string slices?)
// 1. out of bounds | ||
if begin > s.len() || end > s.len() { | ||
let oob_index = if begin > s.len() { begin } else { end }; | ||
panic!("index {} is out of bounds of `{}`{}", oob_index, s_trunc, ellipsis); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for this as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change the existing test to match more of the panic message
9bc3cd4
to
95e8c41
Compare
@killercup maybe just s/index/byte index/ in the panic messages. |
I like |
Separate out of bounds errors from character boundary errors, and print more details for character boundary errors. Example: &"abcαβγ"[..4] thread 'str::test_slice_fail_boundary_1' panicked at 'byte index 4 is not a char boundary; it is inside `α` (bytes 3..5) of `abcαβγ`'
9c659ea
to
d83fff3
Compare
ping @sfackler ; you're the reviewer on this PR but haven't commented. Should we move it to someone else? |
Oh sorry, I also forgot about this. Thank you steveklabnik! |
Oops, missed this! I'm a bit worried about including the entire string in the panic message. I can see that getting pretty incomprehensible for larger strings. Maybe just 20 or 30 characters of context around the index? |
This PR keeps the previous behaviour that the string piece shown is truncated to 256 bytes or less. That was just some arbitrary limit that I picked in a previous that PR introduced some truncation at all. |
Oh great, my mistake! |
@bors r+ |
📌 Commit d83fff3 has been approved by |
⌛ Testing commit d83fff3 with merge 9954355... |
Use more specific panic message for &str slicing errors Separate out of bounds errors from character boundary errors, and print more details for character boundary errors. It reports the first error it finds in: 1. begin out of bounds 2. end out of bounds 3. begin <= end violated 3. begin not char boundary 5. end not char boundary. Example: &"abcαβγ"[..4] thread 'str::test_slice_fail_boundary_1' panicked at 'byte index 4 is not a char boundary; it is inside 'α' (bytes 3..5) of `abcαβγ`' Fixes #38052
💔 Test failed - status-travis |
@bors: retry
* android timeout
…On Tue, Jan 3, 2017 at 12:54 PM, bors ***@***.***> wrote:
💔 Test failed - status-travis
<https://travis-ci.org/rust-lang/rust/builds/188617319>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#38066 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAD95CYOl3qSCEJ6itbP_hwP8FeXEANPks5rOrV8gaJpZM4K-hVh>
.
|
⌛ Testing commit d83fff3 with merge 4682271... |
Use more specific panic message for &str slicing errors Separate out of bounds errors from character boundary errors, and print more details for character boundary errors. It reports the first error it finds in: 1. begin out of bounds 2. end out of bounds 3. begin <= end violated 3. begin not char boundary 5. end not char boundary. Example: &"abcαβγ"[..4] thread 'str::test_slice_fail_boundary_1' panicked at 'byte index 4 is not a char boundary; it is inside 'α' (bytes 3..5) of `abcαβγ`' Fixes #38052
☀️ Test successful - status-appveyor, status-travis |
Version 1.16.0 (2017-03-16) =========================== Language -------- * Lifetimes in statics and consts default to `'static`. [RFC 1623] * [The compiler's `dead_code` lint now accounts for type aliases][38051]. * [Uninhabitable enums (those without any variants) no longer permit wildcard match patterns][38069] * [Clean up semantics of `self` in an import list][38313] * [`Self` may appear in `impl` headers][38920] * [`Self` may appear in struct expressions][39282] Compiler -------- * [`rustc` now supports `--emit=metadata`, which causes rustc to emit a `.rmeta` file containing only crate metadata][38571]. This can be used by tools like the Rust Language Service to perform metadata-only builds. * [Levenshtein based typo suggestions now work in most places, while previously they worked only for fields and sometimes for local variables][38927]. Together with the overhaul of "no resolution"/"unexpected resolution" errors (#[38154]) they result in large and systematic improvement in resolution diagnostics. * [Fix `transmute::<T, U>` where `T` requires a bigger alignment than `U`][38670] * [rustc: use -Xlinker when specifying an rpath with ',' in it][38798] * [`rustc` no longer attempts to provide "consider using an explicit lifetime" suggestions][37057]. They were inaccurate. Stabilized APIs --------------- * [`VecDeque::truncate`] * [`VecDeque::resize`] * [`String::insert_str`] * [`Duration::checked_add`] * [`Duration::checked_sub`] * [`Duration::checked_div`] * [`Duration::checked_mul`] * [`str::replacen`] * [`str::repeat`] * [`SocketAddr::is_ipv4`] * [`SocketAddr::is_ipv6`] * [`IpAddr::is_ipv4`] * [`IpAddr::is_ipv6`] * [`Vec::dedup_by`] * [`Vec::dedup_by_key`] * [`Result::unwrap_or_default`] * [`<*const T>::wrapping_offset`] * [`<*mut T>::wrapping_offset`] * `CommandExt::creation_flags` * [`File::set_permissions`] * [`String::split_off`] Libraries --------- * [`[T]::binary_search` and `[T]::binary_search_by_key` now take their argument by `Borrow` parameter][37761] * [All public types in std implement `Debug`][38006] * [`IpAddr` implements `From<Ipv4Addr>` and `From<Ipv6Addr>`][38327] * [`Ipv6Addr` implements `From<[u16; 8]>`][38131] * [Ctrl-Z returns from `Stdin.read()` when reading from the console on Windows][38274] * [std: Fix partial writes in `LineWriter`][38062] * [std: Clamp max read/write sizes on Unix][38062] * [Use more specific panic message for `&str` slicing errors][38066] * [`TcpListener::set_only_v6` is deprecated][38304]. This functionality cannot be achieved in std currently. * [`writeln!`, like `println!`, now accepts a form with no string or formatting arguments, to just print a newline][38469] * [Implement `iter::Sum` and `iter::Product` for `Result`][38580] * [Reduce the size of static data in `std_unicode::tables`][38781] * [`char::EscapeDebug`, `EscapeDefault`, `EscapeUnicode`, `CaseMappingIter`, `ToLowercase`, `ToUppercase`, implement `Display`][38909] * [`Duration` implements `Sum`][38712] * [`String` implements `ToSocketAddrs`][39048] Cargo ----- * [The `cargo check` command does a type check of a project without building it][cargo/3296] * [crates.io will display CI badges from Travis and AppVeyor, if specified in Cargo.toml][cargo/3546] * [crates.io will display categories listed in Cargo.toml][cargo/3301] * [Compilation profiles accept integer values for `debug`, in addition to `true` and `false`. These are passed to `rustc` as the value to `-C debuginfo`][cargo/3534] * [Implement `cargo --version --verbose`][cargo/3604] * [All builds now output 'dep-info' build dependencies compatible with make and ninja][cargo/3557] * [Build all workspace members with `build --all`][cargo/3511] * [Document all workspace members with `doc --all`][cargo/3515] * [Path deps outside workspace are not members][cargo/3443] Misc ---- * [`rustdoc` has a `--sysroot` argument that, like `rustc`, specifies the path to the Rust implementation][38589] * [The `armv7-linux-androideabi` target no longer enables NEON extensions, per Google's ABI guide][38413] * [The stock standard library can be compiled for Redox OS][38401] * [Rust has initial SPARC support][38726]. Tier 3. No builds available. * [Rust has experimental support for Nvidia PTX][38559]. Tier 3. No builds available. * [Fix backtraces on i686-pc-windows-gnu by disabling FPO][39379] Compatibility Notes ------------------- * [Uninhabitable enums (those without any variants) no longer permit wildcard match patterns][38069] * In this release, references to uninhabited types can not be pattern-matched. This was accidentally allowed in 1.15. * [The compiler's `dead_code` lint now accounts for type aliases][38051]. * [Ctrl-Z returns from `Stdin.read()` when reading from the console on Windows][38274] * [Clean up semantics of `self` in an import list][38313] [37057]: rust-lang/rust#37057 [37761]: rust-lang/rust#37761 [38006]: rust-lang/rust#38006 [38051]: rust-lang/rust#38051 [38062]: rust-lang/rust#38062 [38062]: rust-lang/rust#38622 [38066]: rust-lang/rust#38066 [38069]: rust-lang/rust#38069 [38131]: rust-lang/rust#38131 [38154]: rust-lang/rust#38154 [38274]: rust-lang/rust#38274 [38304]: rust-lang/rust#38304 [38313]: rust-lang/rust#38313 [38314]: rust-lang/rust#38314 [38327]: rust-lang/rust#38327 [38401]: rust-lang/rust#38401 [38413]: rust-lang/rust#38413 [38469]: rust-lang/rust#38469 [38559]: rust-lang/rust#38559 [38571]: rust-lang/rust#38571 [38580]: rust-lang/rust#38580 [38589]: rust-lang/rust#38589 [38670]: rust-lang/rust#38670 [38712]: rust-lang/rust#38712 [38726]: rust-lang/rust#38726 [38781]: rust-lang/rust#38781 [38798]: rust-lang/rust#38798 [38909]: rust-lang/rust#38909 [38920]: rust-lang/rust#38920 [38927]: rust-lang/rust#38927 [39048]: rust-lang/rust#39048 [39282]: rust-lang/rust#39282 [39379]: rust-lang/rust#39379 [`<*const T>::wrapping_offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset [`<*mut T>::wrapping_offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset [`Duration::checked_add`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_add [`Duration::checked_div`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_div [`Duration::checked_mul`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_mul [`Duration::checked_sub`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_sub [`File::set_permissions`]: https://doc.rust-lang.org/std/fs/struct.File.html#method.set_permissions [`IpAddr::is_ipv4`]: https://doc.rust-lang.org/std/net/enum.IpAddr.html#method.is_ipv4 [`IpAddr::is_ipv6`]: https://doc.rust-lang.org/std/net/enum.IpAddr.html#method.is_ipv6 [`Result::unwrap_or_default`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.unwrap_or_default [`SocketAddr::is_ipv4`]: https://doc.rust-lang.org/std/net/enum.SocketAddr.html#method.is_ipv4 [`SocketAddr::is_ipv6`]: https://doc.rust-lang.org/std/net/enum.SocketAddr.html#method.is_ipv6 [`String::insert_str`]: https://doc.rust-lang.org/std/string/struct.String.html#method.insert_str [`String::split_off`]: https://doc.rust-lang.org/std/string/struct.String.html#method.split_off [`Vec::dedup_by_key`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.dedup_by_key [`Vec::dedup_by`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.dedup_by [`VecDeque::resize`]: https://doc.rust-lang.org/std/collections/vec_deque/struct.VecDeque.html#method.resize [`VecDeque::truncate`]: https://doc.rust-lang.org/std/collections/vec_deque/struct.VecDeque.html#method.truncate [`str::repeat`]: https://doc.rust-lang.org/std/primitive.str.html#method.repeat [`str::replacen`]: https://doc.rust-lang.org/std/primitive.str.html#method.replacen [cargo/3296]: rust-lang/cargo#3296 [cargo/3301]: rust-lang/cargo#3301 [cargo/3443]: rust-lang/cargo#3443 [cargo/3511]: rust-lang/cargo#3511 [cargo/3515]: rust-lang/cargo#3515 [cargo/3534]: rust-lang/cargo#3534 [cargo/3546]: rust-lang/cargo#3546 [cargo/3557]: rust-lang/cargo#3557 [cargo/3604]: rust-lang/cargo#3604 [RFC 1623]: https://github.com/rust-lang/rfcs/blob/master/text/1623-static.md
Separate out of bounds errors from character boundary errors, and print
more details for character boundary errors.
It reports the first error it finds in:
Example:
Fixes #38052