-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: rename int
and uint
to intptr
/uintptr
#9940
Comments
I think Fixed integers of pseudo-arbitrary width are rarely useful. |
Seems to me I rather like the I think the machine-word-sized |
Why introduce a completely new & (so far) unused naming convention to the language? |
@Thiez: They aren't machine word size, they're pointer-size. On the x32 ABI they will be 32-bit, despite having 16 64-bit integer registers. If you want to use fixed-size integers correctly, you need upper bounds on the size. Fixed-size types named |
@thestinger fair point. Perhaps that should change as well? Since we're not really supposed to be messing around with pointers outside of unsafe blocks, perhaps a pointer-size type is deserving of an ugly name. That opens up the option of having int and uint be machine word sized... |
@cmr: I agree they're awful names. We should discourage using fixed-size types only when bounds are unknown. I think you only want these types in low-level code or for in-memory container sizes. @Thiez: I don't really think word-sized is a useful property. If the upper bound is 32-bit, 32-bit integers will likely be fastest for the use case due to wasting less cache space. |
I realize my suggestion is silly anyway as one would still need a pointer-size variable for array and vector lengths, which is a nice case for int/uint (but not when they're word-sized). Ignore it :) |
I completely agree with @thestinger A machine-word sized integer means bugs and security holes e.g. because you ran the tests on one platform then deployed on others. If one of the platforms has 16-bit int like PalmOS, that's too short to use without thinking carefully about it, so the prudent coding style forbids un-sized int and uint. (Actually the PalmOS 68000 ABI is emulated on a 32-bit ARM so it's not clear what's a machine word.) Hence the strategy of using a pointer-size integer type only in low-level code that requires it, with an ugly name. |
I agree that using int and uint should be discouraged and renaming them to a less straightforward name is better. |
I think that's a good idea. You can't really rely on very much when using I'm not so fond of the names |
IMHO, the interesting questions are: what type should be used to index into arrays, and what should it be named? Indexing into arrays is pretty common. A pointer-sized type is needed to be able to represent any index. It should presumably be unsigned. I'm not sure if there's much reason to also have a signed version. Expanding to a BigInt on overflow doesn't make much sense here. But wrapping around on over/underflow also doesn't make very much sense, I think. If you want to catch over/underflow and I think the strongest argument might be for expanding: negative or larger-than-the-address-space values don't make sense for array indexes, but the array bounds check will already catch that. Meanwhile it's versatile and generally useful for most other situations as well, not just array indexing. The downside is a performance cost relative to a type that wraps on over/underflow. (In the event of a fixed pointer-sized type, the relevant association when naming it should be that it holds any array index, not that it's pointer-sized.) Whatever this type ends up being and named, it's the one that should be in the If someone explicitly needs pointer-sized machine integers for |
Dumb question here, but what's the use of having a signed pointer-sized int at all? Could we get away with having only As for the general idea of this bug, I'm warming to it after seeing how well the removal of |
@bstrie: a signed one is needed for offsets/differences (POSIX has |
@thestinger good point. Subtracting array indexes should yield a signed value. So to reverse the question, what's the need for an unsigned array index type? Is it feasible to allocate a byte array that takes more than half the address space? |
AFAIK the rationale for unsigned types here is to avoid the need for a dynamic check for a negative integer in every function. A bounds check only has to compare against the length, and a reserve/with_capacity function only has to check for overflow, not underflow. It just bubbles up the responsibility for handling underflow as far as possible into the caller (if it needs to check at all - it may not every subtract from an index). |
cc me I have contemplating whether |
@pnkfelix I think the two are very closely related. (basically: if we want to use the existing int/uint as the preferred type for indexing arrays, then they should not be renamed to intptr/uintptr, but if we want to prefer a different type for that (e.g. one which checks for over/underflow), then they should be renamed.) |
To those commenting that Having +1 for this change from me. |
If there's consensus that it's bad practice to use |
@brson We already make folks choose between |
I find this thread confusing.
|
The other possibility was to use a type that doesn't wrap on over/underflow, but eithers traps or extends into a bigint. Which is likely to be slow, but I don't know whether it's been tested. |
(What are bors?) An integer type with platform-specific overflow makes programs Intertwined issues: whether to have non-portable integer types, what to |
There's also the x32 ABI where pointers are smaller than ints. I'd remove variable-width int altogether (except ffi of course). Those who expect their code to run on 32-bit should already be thinking about overflows and use int64/bigint where appropriate, and those who know they'll only ever run on 64-bit should have no problem either way. Are there credible use cases of pointer-sized Rust ints outside ffi? |
Pointer-sized ints are required for representing pointers and indices into On Fri, Dec 6, 2013 at 6:54 AM, György Andrasek notifications@github.comwrote:
|
@nikomatsakis The idea is that int and uint aren't very useful numeric types, and that a fast bigint would be more appropriate most of the time one is actually dealing with numbers. And when one does not want a real numeric type, they probably want one of the fixed-size types anyway. |
@cmr I do not agree with the "probably want one of the fixed-size types" part of that sentence. That is not clear to me -- I think it is very common to have integers that are ultimately indices into some sort of array or tied to the size of a data structure, and for that use case it is natural to want an integer that represents "the address space of the machine". Of course I think having a nice, performant bigint library would be great, particularly for implementing "business logic" or other use cases where a "true integer" is required. But I am not sure how common that really is. |
@ecl3ctic Yes the compiler can and will help here, but I think the principle of least surprise applies. On the other hand, |
On Sat, Jan 11, 2014 at 11:37:02AM -0800, Daniel Micay wrote:
As I wrote earlier, I don't find this argument especially persuasive,
|
@nikomatsakis: As far as I know, GMP is leagues ahead of any other big integer implementation in performance. There's no doubt that it's the best open-source implementation. It has many different algorithms implemented for each operation because with very large integers it has progressively better asymptomatic performance than other libraries. It also has highly optimized hand-written assembly for different revisions of many platforms too, because it's many times faster than the same code in C without specialized instructions. Intel adds relevant instructions with almost every iteration of their CPU architecture too... Haswell has MULX, Broadwell brings ADOX and ADCX, and there are many relevant SSE/AVX instructions. It's licensed under LGPL, which gives you 3 choices:
There are various clones of the library with inferior performance and a less exhaustive API but more permissive licenses. I think Rust should default to using one of these libraries and allow GMP as a drop-in alternative.
This is well-explored territory with |
@CloudiDust: The names http://en.cppreference.com/w/cpp/types/integer The |
@huonw pointed out https://github.com/wbhart/bsdnt on IRC, which seems like My thoughts for auto-overflow is make the type have an align of at least 2, On Sun, Jan 12, 2014 at 3:55 PM, Daniel Micay notifications@github.comwrote:
|
@cmr: It will incur two branches, since you need to check if you have a big integer and then check for overflow. Checking the overflow flag serializes the CPU pipeline quite a bit too. |
If you're limited to 31-bit then it seems that you'll need to use a comparison instruction rather than using the carry/overflow flag. This could be really bad for multiplication. |
Simple example: extern mod extra;
use std::unstable::intrinsics::{abort, u32_mul_with_overflow};
use extra::test::BenchHarness;
#[inline(never)]
fn control(xs: &mut [u32]) {
for x in xs.mut_iter() {
*x *= 5;
}
}
#[inline(never)]
fn check(xs: &mut [u32]) {
for x in xs.mut_iter() {
unsafe {
let (y, o) = u32_mul_with_overflow(*x, 5);
if o {
abort()
}
*x = y;
}
}
}
#[inline(never)]
fn check_libstd(xs: &mut [u32]) {
for x in xs.mut_iter() {
*x = x.checked_mul(&5).unwrap();
}
}
#[bench]
fn bench_control(b: &mut BenchHarness) {
b.iter(|| {
let mut xs = [0, ..1000];
control(xs)
});
}
#[bench]
fn bench_check(b: &mut BenchHarness) {
b.iter(|| {
let mut xs = [0, ..1000];
check(xs)
});
}
#[bench]
fn bench_check_libstd(b: &mut BenchHarness) {
b.iter(|| {
let mut xs = [0, ..1000];
check_libstd(xs)
});
} --opt-level=2
--opt-level=3
Ouch. It becomes a larger slowdown multiplier when you add more operations to the loop too. Since it's increasing the code size a lot, it will bloat the instruction cache too. |
@thestinger Thanks for the link. I am aware that the names come from the C/C++ standards, but still find them confusing (to rust newcomers from outside the C/C++ world). Now come to think of it, this is a convention that can be learnt quickly, and C# actually uses But there may be another problem: the names This is to say, we may have dedicated names for container-indexing integer types, while the fact that they are pointer sized is an implementation detail on certain architectures, just like in C/C++. Here are three pairs of possible candidates: Common pros:
Common cons:
Pros and cons specific to each candidate:
On the other hand, So I lean towards Regarding an arbitrarily sized I am not sure about my stance on the "default integer type" issue, but people must make informed choices consciously. Some "rusty guidelines to integer type selection" in the docs would be great. |
On Sun, Jan 12, 2014 at 12:55:36PM -0800, Daniel Micay wrote:
I do not understand how checking for overflow and failing can possibly |
I'm not saying performing a branch on the contained value and then a check for overflow is faster than the check for overflow. I'm just suggesting that it's worth making benchmarks to measure the cost of both. |
There's a lot of interrelated concerns here:
I personally think that The easiest way to make immediate progress (not the best, mind you) might be the following:
This lets us punt on the topics of bigints, bounds checking, and signed vs unsigned for a later date. |
I don't think there should be a default fallback, It means you can't trust the compiler to infer the type or give an error, and you have to watch out for bugs from this. |
agree with @thestinger and @bstrie. having defaulting for literals when theres no type constraints is a mixed bag, one hand its great sometimes (but mostly when using a repl). Othertimes its really unclear/confusing what it can mean. What about a model where literals are treated as being "polymorphic" if theres no constraints? (this may not make sense in Rust granted), but in haskell / ghc, literals have a generic type until used.
|
Using a fixed-size integer requires carefully considering whether the application enforces bounds on it. Otherwise, you need a big integer instead. A default fallback type removes this thought process in favour of lazy, incorrect code everywhere. Haskell makes the fallback configurable, but the default is a big integer type. |
a wider problem in actual haskell code is users choosing to use Int, and then assuming int is 32 or 64bits always :), but yes, defaulting to integer would be wrong for rust |
|
On Fri, Feb 14, 2014 at 3:45 PM, Daniel Micay notifications@github.comwrote:
|
I agree that the compiler should not automatically choose an arbitrary, potentially dangerous integer type if it can't infer the type from the context. |
manual_let_else: keep macro call on suggestion blocks Closes rust-lang#9940 changelog: [`manual_let_else`]: Do not expand macro calls on suggestions
An arbitrarily sized integer type would be provided in
std
under the nameInt
. I think encouraging use of an arbitrarily sized integer when bounds are unknown is a much better solution than adding failure throwing overflow checks to fixed-size integers.The text was updated successfully, but these errors were encountered: