Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use more specific panic message for &str slicing errors #38066

Merged
merged 1 commit into from
Jan 4, 2017

Conversation

bluss
Copy link
Member

@bluss bluss commented Nov 29, 2016

Separate out of bounds errors from character boundary errors, and print
more details for character boundary errors.

It reports the first error it finds in:

  1. begin out of bounds
  2. end out of bounds
  3. begin <= end violated
  4. begin not char boundary
  5. end not char boundary.

Example:

&"abcαβγ"[..4]

thread 'str::test_slice_fail_boundary_1' panicked at 'byte index 4 is not
a char boundary; it is inside 'α' (bytes 3..5) of `abcαβγ`'

Fixes #38052

@rust-highfive
Copy link
Collaborator

r? @sfackler

(rust_highfive has picked a reviewer for you, use r? to override)

@bluss bluss force-pushed the string-slice-error branch 2 times, most recently from c29b730 to 46a26f2 Compare November 29, 2016 03:21
@durka
Copy link
Contributor

durka commented Nov 29, 2016

Should we show all the problems instead of just the first one?

let ellipsis = if truncated { "[...]" } else { "" };

// 1. out of bounds
let index = if !s.is_char_boundary(begin) { begin } else { end };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing to me. It seems like you choose which index is going to be in the message before checking which one is out of bounds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_char_boundary is false if the index is out of bounds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, it should be confusing, because it's not really right.

while !s.is_char_boundary(char_start) {
char_start -= 1;
}
let ch = s[char_start..].chars().next().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can panic if the string is empty or if char_start is at the end. I think I've convinced myself this can't happen, but at least a comment would help, or just to be defensive

if let Some(ch) = s[char_start..].chars().next() {
    let char_range = char_start .. char_start + ch.len_utf8();
    panic!("index {} is not a char boundary; it is inside {:?} (bytes {:?}) of `{}`{}",
           index, ch, char_range, s_trunc, ellipsis);
} else {
    panic!("index {} is not a char boundary in `{}`{}", index, s_trunc, ellipsis);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

char_start is a character boundary, and then using it for slicing is ok. It could not be larger than len, that case has already been handled. It could not be equal to len, then it would not be an error in the first place.

while !s.is_char_boundary(char_start) {
char_start -= 1;
}
let ch = s[char_start..].chars().next().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if the problematic index is beyond MAX_DISPLAY_LENGTH, the preview isn't going to be very useful. It could instead show the region of the string from max(0, index - MAX_DISPLAY_LENGTH/2) .. min(s.len(), index + MAX_DISPLAY_LENGTH/2) with ellipses.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is unnecessary, it just wants to identify the string somehow.

Copy link
Member

@killercup killercup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Handling the different cases with custom messages makes this much better.

As mentioned in #38052 I'd like to try and convey that indexing a string (slice) is done using byte offsets -- and not character offsets. I think this is one of those messages people will see this in failing tests and then have a hard time recognizing why this went wrong; and then have an even harder time to find a solution like the unicode-segmentation crate. UTF-8 is hard when people don't think about it.

I know putting all that in the panic message is not a solution; but maybe you can change it to "slicing at byte 4 which is not a char boundary" or something like that?

(By the way, is there a lint to catch statically known wrong string slices?)

// 1. out of bounds
if begin > s.len() || end > s.len() {
let oob_index = if begin > s.len() { begin } else { end };
panic!("index {} is out of bounds of `{}`{}", oob_index, s_trunc, ellipsis);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for this as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change the existing test to match more of the panic message

@bluss bluss force-pushed the string-slice-error branch from 9bc3cd4 to 95e8c41 Compare November 29, 2016 19:59
@durka
Copy link
Contributor

durka commented Nov 29, 2016

@killercup maybe just s/index/byte index/ in the panic messages.

@bluss
Copy link
Member Author

bluss commented Nov 29, 2016

I like byte index or something like that. I've gotten into the habit of saying byte offset too, but don't know which is best. char_indices says indices, and other docs say .. what? 😄

Separate out of bounds errors from character boundary errors, and print
more details for character boundary errors.

Example:

    &"abcαβγ"[..4]

    thread 'str::test_slice_fail_boundary_1' panicked at 'byte index 4 is not
    a char boundary; it is inside `α` (bytes 3..5) of `abcαβγ`'
@bluss bluss force-pushed the string-slice-error branch from 9c659ea to d83fff3 Compare November 30, 2016 18:00
@steveklabnik
Copy link
Member

ping @sfackler ; you're the reviewer on this PR but haven't commented. Should we move it to someone else?

@bluss
Copy link
Member Author

bluss commented Jan 3, 2017

Oh sorry, I also forgot about this. Thank you steveklabnik!

@sfackler
Copy link
Member

sfackler commented Jan 3, 2017

Oops, missed this!

I'm a bit worried about including the entire string in the panic message. I can see that getting pretty incomprehensible for larger strings. Maybe just 20 or 30 characters of context around the index?

@bluss
Copy link
Member Author

bluss commented Jan 3, 2017

This PR keeps the previous behaviour that the string piece shown is truncated to 256 bytes or less. That was just some arbitrary limit that I picked in a previous that PR introduced some truncation at all.

@sfackler
Copy link
Member

sfackler commented Jan 3, 2017

Oh great, my mistake!

@sfackler
Copy link
Member

sfackler commented Jan 3, 2017

@bors r+

@bors
Copy link
Contributor

bors commented Jan 3, 2017

📌 Commit d83fff3 has been approved by sfackler

@bors
Copy link
Contributor

bors commented Jan 3, 2017

⌛ Testing commit d83fff3 with merge 9954355...

bors added a commit that referenced this pull request Jan 3, 2017
Use more specific panic message for &str slicing errors

Separate out of bounds errors from character boundary errors, and print
more details for character boundary errors.

It reports the first error it finds in:

1. begin out of bounds
2. end out of bounds
3. begin <= end violated
3. begin not char boundary
5. end not char boundary.

Example:

    &"abcαβγ"[..4]

    thread 'str::test_slice_fail_boundary_1' panicked at 'byte index 4 is not
    a char boundary; it is inside 'α' (bytes 3..5) of `abcαβγ`'

Fixes #38052
@bors
Copy link
Contributor

bors commented Jan 3, 2017

💔 Test failed - status-travis

@alexcrichton
Copy link
Member

alexcrichton commented Jan 3, 2017 via email

@bors
Copy link
Contributor

bors commented Jan 3, 2017

⌛ Testing commit d83fff3 with merge 4682271...

bors added a commit that referenced this pull request Jan 3, 2017
Use more specific panic message for &str slicing errors

Separate out of bounds errors from character boundary errors, and print
more details for character boundary errors.

It reports the first error it finds in:

1. begin out of bounds
2. end out of bounds
3. begin <= end violated
3. begin not char boundary
5. end not char boundary.

Example:

    &"abcαβγ"[..4]

    thread 'str::test_slice_fail_boundary_1' panicked at 'byte index 4 is not
    a char boundary; it is inside 'α' (bytes 3..5) of `abcαβγ`'

Fixes #38052
@bors
Copy link
Contributor

bors commented Jan 4, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: sfackler
Pushing 4682271 to master...

@bors bors merged commit d83fff3 into rust-lang:master Jan 4, 2017
jsonn pushed a commit to jsonn/pkgsrc that referenced this pull request Mar 20, 2017
Version 1.16.0 (2017-03-16)
===========================

Language
--------

* Lifetimes in statics and consts default to `'static`. [RFC 1623]
* [The compiler's `dead_code` lint now accounts for type aliases][38051].
* [Uninhabitable enums (those without any variants) no longer permit wildcard
  match patterns][38069]
* [Clean up semantics of `self` in an import list][38313]
* [`Self` may appear in `impl` headers][38920]
* [`Self` may appear in struct expressions][39282]

Compiler
--------

* [`rustc` now supports `--emit=metadata`, which causes rustc to emit
  a `.rmeta` file containing only crate metadata][38571]. This can be
  used by tools like the Rust Language Service to perform
  metadata-only builds.
* [Levenshtein based typo suggestions now work in most places, while
  previously they worked only for fields and sometimes for local
  variables][38927]. Together with the overhaul of "no
  resolution"/"unexpected resolution" errors (#[38154]) they result in
  large and systematic improvement in resolution diagnostics.
* [Fix `transmute::<T, U>` where `T` requires a bigger alignment than
  `U`][38670]
* [rustc: use -Xlinker when specifying an rpath with ',' in it][38798]
* [`rustc` no longer attempts to provide "consider using an explicit
  lifetime" suggestions][37057]. They were inaccurate.

Stabilized APIs
---------------

* [`VecDeque::truncate`]
* [`VecDeque::resize`]
* [`String::insert_str`]
* [`Duration::checked_add`]
* [`Duration::checked_sub`]
* [`Duration::checked_div`]
* [`Duration::checked_mul`]
* [`str::replacen`]
* [`str::repeat`]
* [`SocketAddr::is_ipv4`]
* [`SocketAddr::is_ipv6`]
* [`IpAddr::is_ipv4`]
* [`IpAddr::is_ipv6`]
* [`Vec::dedup_by`]
* [`Vec::dedup_by_key`]
* [`Result::unwrap_or_default`]
* [`<*const T>::wrapping_offset`]
* [`<*mut T>::wrapping_offset`]
* `CommandExt::creation_flags`
* [`File::set_permissions`]
* [`String::split_off`]

Libraries
---------

* [`[T]::binary_search` and `[T]::binary_search_by_key` now take
  their argument by `Borrow` parameter][37761]
* [All public types in std implement `Debug`][38006]
* [`IpAddr` implements `From<Ipv4Addr>` and `From<Ipv6Addr>`][38327]
* [`Ipv6Addr` implements `From<[u16; 8]>`][38131]
* [Ctrl-Z returns from `Stdin.read()` when reading from the console on
  Windows][38274]
* [std: Fix partial writes in `LineWriter`][38062]
* [std: Clamp max read/write sizes on Unix][38062]
* [Use more specific panic message for `&str` slicing errors][38066]
* [`TcpListener::set_only_v6` is deprecated][38304]. This
  functionality cannot be achieved in std currently.
* [`writeln!`, like `println!`, now accepts a form with no string
  or formatting arguments, to just print a newline][38469]
* [Implement `iter::Sum` and `iter::Product` for `Result`][38580]
* [Reduce the size of static data in `std_unicode::tables`][38781]
* [`char::EscapeDebug`, `EscapeDefault`, `EscapeUnicode`,
  `CaseMappingIter`, `ToLowercase`, `ToUppercase`, implement
  `Display`][38909]
* [`Duration` implements `Sum`][38712]
* [`String` implements `ToSocketAddrs`][39048]

Cargo
-----

* [The `cargo check` command does a type check of a project without
  building it][cargo/3296]
* [crates.io will display CI badges from Travis and AppVeyor, if
  specified in Cargo.toml][cargo/3546]
* [crates.io will display categories listed in Cargo.toml][cargo/3301]
* [Compilation profiles accept integer values for `debug`, in addition
  to `true` and `false`. These are passed to `rustc` as the value to
  `-C debuginfo`][cargo/3534]
* [Implement `cargo --version --verbose`][cargo/3604]
* [All builds now output 'dep-info' build dependencies compatible with
  make and ninja][cargo/3557]
* [Build all workspace members with `build --all`][cargo/3511]
* [Document all workspace members with `doc --all`][cargo/3515]
* [Path deps outside workspace are not members][cargo/3443]

Misc
----

* [`rustdoc` has a `--sysroot` argument that, like `rustc`, specifies
  the path to the Rust implementation][38589]
* [The `armv7-linux-androideabi` target no longer enables NEON
  extensions, per Google's ABI guide][38413]
* [The stock standard library can be compiled for Redox OS][38401]
* [Rust has initial SPARC support][38726]. Tier 3. No builds
  available.
* [Rust has experimental support for Nvidia PTX][38559]. Tier 3. No
  builds available.
* [Fix backtraces on i686-pc-windows-gnu by disabling FPO][39379]

Compatibility Notes
-------------------

* [Uninhabitable enums (those without any variants) no longer permit wildcard
  match patterns][38069]
* In this release, references to uninhabited types can not be
  pattern-matched. This was accidentally allowed in 1.15.
* [The compiler's `dead_code` lint now accounts for type aliases][38051].
* [Ctrl-Z returns from `Stdin.read()` when reading from the console on
  Windows][38274]
* [Clean up semantics of `self` in an import list][38313]

[37057]: rust-lang/rust#37057
[37761]: rust-lang/rust#37761
[38006]: rust-lang/rust#38006
[38051]: rust-lang/rust#38051
[38062]: rust-lang/rust#38062
[38062]: rust-lang/rust#38622
[38066]: rust-lang/rust#38066
[38069]: rust-lang/rust#38069
[38131]: rust-lang/rust#38131
[38154]: rust-lang/rust#38154
[38274]: rust-lang/rust#38274
[38304]: rust-lang/rust#38304
[38313]: rust-lang/rust#38313
[38314]: rust-lang/rust#38314
[38327]: rust-lang/rust#38327
[38401]: rust-lang/rust#38401
[38413]: rust-lang/rust#38413
[38469]: rust-lang/rust#38469
[38559]: rust-lang/rust#38559
[38571]: rust-lang/rust#38571
[38580]: rust-lang/rust#38580
[38589]: rust-lang/rust#38589
[38670]: rust-lang/rust#38670
[38712]: rust-lang/rust#38712
[38726]: rust-lang/rust#38726
[38781]: rust-lang/rust#38781
[38798]: rust-lang/rust#38798
[38909]: rust-lang/rust#38909
[38920]: rust-lang/rust#38920
[38927]: rust-lang/rust#38927
[39048]: rust-lang/rust#39048
[39282]: rust-lang/rust#39282
[39379]: rust-lang/rust#39379
[`<*const T>::wrapping_offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset
[`<*mut T>::wrapping_offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset
[`Duration::checked_add`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_add
[`Duration::checked_div`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_div
[`Duration::checked_mul`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_mul
[`Duration::checked_sub`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_sub
[`File::set_permissions`]: https://doc.rust-lang.org/std/fs/struct.File.html#method.set_permissions
[`IpAddr::is_ipv4`]: https://doc.rust-lang.org/std/net/enum.IpAddr.html#method.is_ipv4
[`IpAddr::is_ipv6`]: https://doc.rust-lang.org/std/net/enum.IpAddr.html#method.is_ipv6
[`Result::unwrap_or_default`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.unwrap_or_default
[`SocketAddr::is_ipv4`]: https://doc.rust-lang.org/std/net/enum.SocketAddr.html#method.is_ipv4
[`SocketAddr::is_ipv6`]: https://doc.rust-lang.org/std/net/enum.SocketAddr.html#method.is_ipv6
[`String::insert_str`]: https://doc.rust-lang.org/std/string/struct.String.html#method.insert_str
[`String::split_off`]: https://doc.rust-lang.org/std/string/struct.String.html#method.split_off
[`Vec::dedup_by_key`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.dedup_by_key
[`Vec::dedup_by`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.dedup_by
[`VecDeque::resize`]:  https://doc.rust-lang.org/std/collections/vec_deque/struct.VecDeque.html#method.resize
[`VecDeque::truncate`]: https://doc.rust-lang.org/std/collections/vec_deque/struct.VecDeque.html#method.truncate
[`str::repeat`]: https://doc.rust-lang.org/std/primitive.str.html#method.repeat
[`str::replacen`]: https://doc.rust-lang.org/std/primitive.str.html#method.replacen
[cargo/3296]: rust-lang/cargo#3296
[cargo/3301]: rust-lang/cargo#3301
[cargo/3443]: rust-lang/cargo#3443
[cargo/3511]: rust-lang/cargo#3511
[cargo/3515]: rust-lang/cargo#3515
[cargo/3534]: rust-lang/cargo#3534
[cargo/3546]: rust-lang/cargo#3546
[cargo/3557]: rust-lang/cargo#3557
[cargo/3604]: rust-lang/cargo#3604
[RFC 1623]: https://github.com/rust-lang/rfcs/blob/master/text/1623-static.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants