-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the size of static data in std_unicode::tables #38781
Conversation
It was used to measure before/after size in cfaf66c.
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @nikomatsakis (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I thought about doing this but was thinking more about code complexity than static size. The latter is important.
pub_string = "" | ||
if is_pub: | ||
pub_string = "pub " | ||
f.write(" %sconst %s: &'static super::SmallBoolTrie = &super::SmallBoolTrie {\n" % (pub_string, name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/checkout/src/etc/unicode.py:461: line longer than 100 chars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
`BoolTrie` works well for sets of code points spread out through most of Unicode’s range, but is uses a lot of space for sets with few, mostly low, code points. This switches a few of its instances to a similar but simpler trie data structure. ## Before `size_of::<BoolTrie>()` is 1552, which is added to `table.r3.len() * 8 + t.r5.len() + t.r6.len() * 8`: * `Cc_table`: 1632 * `White_Space_table`: 1656 * `Pattern_White_Space_table`: 1640 * Total: 4928 bytes ## After `size_of::<SmallBoolTrie>()` is 32, which is added to `t.r1.len() + t.r2.len() * 8`: * `Cc_table`: 51 * `White_Space_table`: 273 * `Pattern_White_Space_table`: 193 * Total: 517 bytes ## Difference Every Rust program with `std` statically linked should be about 4 KB smaller.
This seems good to me, but I feel like someone from @rust-lang/libs should weigh in. |
@bors: r+ Awesome wins, thanks @SimonSapin! Also thanks for the sanity check as well @raphlinus :) |
📌 Commit 3b208d2 has been approved by |
⌛ Testing commit 3b208d2 with merge 582733d... |
💔 Test failed - status-travis |
… On Thu, Jan 5, 2017 at 8:41 AM, bors ***@***.***> wrote:
💔 Test failed - status-travis
<https://travis-ci.org/rust-lang/rust/builds/189211288>
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#38781 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAD95HzvamRWAGqSFIin_-zP5f_XzaxVks5rPR0-gaJpZM4LZKKK>
.
|
⌛ Testing commit 3b208d2 with merge e9a3cb4... |
💔 Test failed - status-travis |
@bors: retry
|
Reduce the size of static data in std_unicode::tables `BoolTrie` works well for sets of code points spread out through most of Unicode’s range, but is uses a lot of space for sets with few, mostly low, code points. This switches a few of its instances to a similar but simpler trie data structure. CC @raphlinus, who wrote the original `BoolTrie`. ## Before `size_of::<BoolTrie>()` is 1552, which is added to `table.r3.len() * 8 + t.r5.len() + t.r6.len() * 8`: * `Cc_table`: 1632 * `White_Space_table`: 1656 * `Pattern_White_Space_table`: 1640 * Total: 4928 bytes ## After `size_of::<SmallBoolTrie>()` is 32, which is added to `t.r1.len() + t.r2.len() * 8`: * `Cc_table`: 51 * `White_Space_table`: 273 * `Pattern_White_Space_table`: 193 * Total: 517 bytes ## Difference Every Rust program with `std` statically linked should be about 4 KB smaller.
☀️ Test successful - status-appveyor, status-travis |
|
||
bytes_old = 0 | ||
bytes_new = 0 | ||
import fileinput, re, os, sys, operator, math | ||
|
||
preamble = '''// Copyright 2012-2016 The Rust Project Developers. See the COPYRIGHT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, looks like the copyrights are lagging behind the new year.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter? Also, this PR is merged, so if you want something to happen you should open a new issue or PR.
Version 1.16.0 (2017-03-16) =========================== Language -------- * Lifetimes in statics and consts default to `'static`. [RFC 1623] * [The compiler's `dead_code` lint now accounts for type aliases][38051]. * [Uninhabitable enums (those without any variants) no longer permit wildcard match patterns][38069] * [Clean up semantics of `self` in an import list][38313] * [`Self` may appear in `impl` headers][38920] * [`Self` may appear in struct expressions][39282] Compiler -------- * [`rustc` now supports `--emit=metadata`, which causes rustc to emit a `.rmeta` file containing only crate metadata][38571]. This can be used by tools like the Rust Language Service to perform metadata-only builds. * [Levenshtein based typo suggestions now work in most places, while previously they worked only for fields and sometimes for local variables][38927]. Together with the overhaul of "no resolution"/"unexpected resolution" errors (#[38154]) they result in large and systematic improvement in resolution diagnostics. * [Fix `transmute::<T, U>` where `T` requires a bigger alignment than `U`][38670] * [rustc: use -Xlinker when specifying an rpath with ',' in it][38798] * [`rustc` no longer attempts to provide "consider using an explicit lifetime" suggestions][37057]. They were inaccurate. Stabilized APIs --------------- * [`VecDeque::truncate`] * [`VecDeque::resize`] * [`String::insert_str`] * [`Duration::checked_add`] * [`Duration::checked_sub`] * [`Duration::checked_div`] * [`Duration::checked_mul`] * [`str::replacen`] * [`str::repeat`] * [`SocketAddr::is_ipv4`] * [`SocketAddr::is_ipv6`] * [`IpAddr::is_ipv4`] * [`IpAddr::is_ipv6`] * [`Vec::dedup_by`] * [`Vec::dedup_by_key`] * [`Result::unwrap_or_default`] * [`<*const T>::wrapping_offset`] * [`<*mut T>::wrapping_offset`] * `CommandExt::creation_flags` * [`File::set_permissions`] * [`String::split_off`] Libraries --------- * [`[T]::binary_search` and `[T]::binary_search_by_key` now take their argument by `Borrow` parameter][37761] * [All public types in std implement `Debug`][38006] * [`IpAddr` implements `From<Ipv4Addr>` and `From<Ipv6Addr>`][38327] * [`Ipv6Addr` implements `From<[u16; 8]>`][38131] * [Ctrl-Z returns from `Stdin.read()` when reading from the console on Windows][38274] * [std: Fix partial writes in `LineWriter`][38062] * [std: Clamp max read/write sizes on Unix][38062] * [Use more specific panic message for `&str` slicing errors][38066] * [`TcpListener::set_only_v6` is deprecated][38304]. This functionality cannot be achieved in std currently. * [`writeln!`, like `println!`, now accepts a form with no string or formatting arguments, to just print a newline][38469] * [Implement `iter::Sum` and `iter::Product` for `Result`][38580] * [Reduce the size of static data in `std_unicode::tables`][38781] * [`char::EscapeDebug`, `EscapeDefault`, `EscapeUnicode`, `CaseMappingIter`, `ToLowercase`, `ToUppercase`, implement `Display`][38909] * [`Duration` implements `Sum`][38712] * [`String` implements `ToSocketAddrs`][39048] Cargo ----- * [The `cargo check` command does a type check of a project without building it][cargo/3296] * [crates.io will display CI badges from Travis and AppVeyor, if specified in Cargo.toml][cargo/3546] * [crates.io will display categories listed in Cargo.toml][cargo/3301] * [Compilation profiles accept integer values for `debug`, in addition to `true` and `false`. These are passed to `rustc` as the value to `-C debuginfo`][cargo/3534] * [Implement `cargo --version --verbose`][cargo/3604] * [All builds now output 'dep-info' build dependencies compatible with make and ninja][cargo/3557] * [Build all workspace members with `build --all`][cargo/3511] * [Document all workspace members with `doc --all`][cargo/3515] * [Path deps outside workspace are not members][cargo/3443] Misc ---- * [`rustdoc` has a `--sysroot` argument that, like `rustc`, specifies the path to the Rust implementation][38589] * [The `armv7-linux-androideabi` target no longer enables NEON extensions, per Google's ABI guide][38413] * [The stock standard library can be compiled for Redox OS][38401] * [Rust has initial SPARC support][38726]. Tier 3. No builds available. * [Rust has experimental support for Nvidia PTX][38559]. Tier 3. No builds available. * [Fix backtraces on i686-pc-windows-gnu by disabling FPO][39379] Compatibility Notes ------------------- * [Uninhabitable enums (those without any variants) no longer permit wildcard match patterns][38069] * In this release, references to uninhabited types can not be pattern-matched. This was accidentally allowed in 1.15. * [The compiler's `dead_code` lint now accounts for type aliases][38051]. * [Ctrl-Z returns from `Stdin.read()` when reading from the console on Windows][38274] * [Clean up semantics of `self` in an import list][38313] [37057]: rust-lang/rust#37057 [37761]: rust-lang/rust#37761 [38006]: rust-lang/rust#38006 [38051]: rust-lang/rust#38051 [38062]: rust-lang/rust#38062 [38062]: rust-lang/rust#38622 [38066]: rust-lang/rust#38066 [38069]: rust-lang/rust#38069 [38131]: rust-lang/rust#38131 [38154]: rust-lang/rust#38154 [38274]: rust-lang/rust#38274 [38304]: rust-lang/rust#38304 [38313]: rust-lang/rust#38313 [38314]: rust-lang/rust#38314 [38327]: rust-lang/rust#38327 [38401]: rust-lang/rust#38401 [38413]: rust-lang/rust#38413 [38469]: rust-lang/rust#38469 [38559]: rust-lang/rust#38559 [38571]: rust-lang/rust#38571 [38580]: rust-lang/rust#38580 [38589]: rust-lang/rust#38589 [38670]: rust-lang/rust#38670 [38712]: rust-lang/rust#38712 [38726]: rust-lang/rust#38726 [38781]: rust-lang/rust#38781 [38798]: rust-lang/rust#38798 [38909]: rust-lang/rust#38909 [38920]: rust-lang/rust#38920 [38927]: rust-lang/rust#38927 [39048]: rust-lang/rust#39048 [39282]: rust-lang/rust#39282 [39379]: rust-lang/rust#39379 [`<*const T>::wrapping_offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset [`<*mut T>::wrapping_offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset [`Duration::checked_add`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_add [`Duration::checked_div`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_div [`Duration::checked_mul`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_mul [`Duration::checked_sub`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_sub [`File::set_permissions`]: https://doc.rust-lang.org/std/fs/struct.File.html#method.set_permissions [`IpAddr::is_ipv4`]: https://doc.rust-lang.org/std/net/enum.IpAddr.html#method.is_ipv4 [`IpAddr::is_ipv6`]: https://doc.rust-lang.org/std/net/enum.IpAddr.html#method.is_ipv6 [`Result::unwrap_or_default`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.unwrap_or_default [`SocketAddr::is_ipv4`]: https://doc.rust-lang.org/std/net/enum.SocketAddr.html#method.is_ipv4 [`SocketAddr::is_ipv6`]: https://doc.rust-lang.org/std/net/enum.SocketAddr.html#method.is_ipv6 [`String::insert_str`]: https://doc.rust-lang.org/std/string/struct.String.html#method.insert_str [`String::split_off`]: https://doc.rust-lang.org/std/string/struct.String.html#method.split_off [`Vec::dedup_by_key`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.dedup_by_key [`Vec::dedup_by`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.dedup_by [`VecDeque::resize`]: https://doc.rust-lang.org/std/collections/vec_deque/struct.VecDeque.html#method.resize [`VecDeque::truncate`]: https://doc.rust-lang.org/std/collections/vec_deque/struct.VecDeque.html#method.truncate [`str::repeat`]: https://doc.rust-lang.org/std/primitive.str.html#method.repeat [`str::replacen`]: https://doc.rust-lang.org/std/primitive.str.html#method.replacen [cargo/3296]: rust-lang/cargo#3296 [cargo/3301]: rust-lang/cargo#3301 [cargo/3443]: rust-lang/cargo#3443 [cargo/3511]: rust-lang/cargo#3511 [cargo/3515]: rust-lang/cargo#3515 [cargo/3534]: rust-lang/cargo#3534 [cargo/3546]: rust-lang/cargo#3546 [cargo/3557]: rust-lang/cargo#3557 [cargo/3604]: rust-lang/cargo#3604 [RFC 1623]: https://github.com/rust-lang/rfcs/blob/master/text/1623-static.md
BoolTrie
works well for sets of code points spread out through most of Unicode’s range, but is uses a lot of space for sets with few, mostly low, code points.This switches a few of its instances to a similar but simpler trie data structure.
CC @raphlinus, who wrote the original
BoolTrie
.Before
size_of::<BoolTrie>()
is 1552, which is added totable.r3.len() * 8 + t.r5.len() + t.r6.len() * 8
:Cc_table
: 1632White_Space_table
: 1656Pattern_White_Space_table
: 1640After
size_of::<SmallBoolTrie>()
is 32, which is added tot.r1.len() + t.r2.len() * 8
:Cc_table
: 51White_Space_table
: 273Pattern_White_Space_table
: 193Difference
Every Rust program with
std
statically linked should be about 4 KB smaller.