Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What about: "odd" pointer sizes #255

Open
RalfJung opened this issue Nov 7, 2020 · 9 comments
Open

What about: "odd" pointer sizes #255

RalfJung opened this issue Nov 7, 2020 · 9 comments

Comments

@RalfJung
Copy link
Member

RalfJung commented Nov 7, 2020

@chorman0773 brings up the issue of platforms where pointers have "strange" sizes, such as 3 bytes.

if Rust wants to support such platforms, interesting questions arise:

  • Is size_of::<*const ()> == 3? That seems like the most obvious choice, but @chorman0773's comment indicates C makes the size 4 and the highest byte is somehow "dead". Not sure why a size of 4 would be reasonable though, if there are only 3 bytes of data.
  • Is usize a magic u24, or is it u32 and usize-to-ptr casts truncate, or something else?

I am probably also missing some aspects of this due to a lack of familiarity with such platforms. :)
This seems somewhat related to #29 in that it is about "strange platforms".

@Lokathor
Copy link
Contributor

Lokathor commented Nov 7, 2020

size is probably 4 for alignment reasons

@chorman0773
Copy link
Contributor

chorman0773 commented Nov 7, 2020

On 65816 at least, according to the definition I have used and written, size_of::<*const ()>() (and likewise in C, sizeof(void*)) is 4. This is because the alignment (which needs to be a power of two) is 4, so that it's not possible to allocate a pointer across a bank (which can break things entirely, since the CPU can access the entire pointer, and may possibly wrap at bank boundaries). Pointers are 0 extended by the ABI (and the compiler is recommended to do so as well), for various reasons (it also allows pointer arithmetic to be done the 32-bit functions when the compiler doesn't inline the software addition, which would be trivial, though requires two additions of 16-bits each)

As for usize, ideally it would be a u24 zero-extended to u32, likewise for isize (but sign-extended as well). However, I remember it being mentioned that integer types have no padding and no invalid values (except for uninit, but see #71). I interpreted this to mean that every possible bit-pattern represents a distinct valid value of the type. If this extends to usize, and likewise to isize, then this would be incompatible with the requirements on pointers, because as mentioned on #76, usize::MAX as *const () is safe (and can be dereferenced soundly*).

(*On lccc, I would prefer that, even for accesses of size 0, it needs to point to an object. However because lccc has rules that allow it to invent objects, this can be trivially worked around)

@sollyucko
Copy link

What about e.g. eZ80, which AFAIK has 24-bit addresses and no alignment requirements?

@RalfJung
Copy link
Member Author

RalfJung commented Nov 7, 2020

I guess we could imagine the last byte of pointers and usize to be padding?

However, I remember it being mentioned that integer types have no padding

They don't on normal platforms, but for weird things like 3-byte integers the best option might me to make an exception...

@chorman0773
Copy link
Contributor

They don't on normal platforms

A possible good idea would be to specify that the normal integer types un (n∈{8,16,32,64,128}) have a size equal to n/8, and every initialized bit pattern represents a distinct valid value, and leave it unspecified for usize (resp. for in and isize). It would also leave rust open to adding extended integer types (IE. un, n∈Z+/{8,16,32,64,128}), which would suffer the same issue as usize here. In the case of usize, whether or not any padding bits exist, the value of those padding bits, and whether or not particular padding bits are valid, would be unspecified (IE. they can be signed or zero extended, or left indeterminate/uinitialized).

What about e.g. eZ80, which AFAIK has 24-bit addresses and no alignment requirements?

On 65816 the alignment requirement is imposed by the ABI (that is, there is no hardware level alignment on it either). It was done because pointers are a "scalar unit" which can be accessed as one value by the CPU, and sometimes, those accesses can wrap at the bank bounderies (every 64kiB), so I have to ensure no scalar unit can be allocated accross a bank. If the same problem arises there, the eZ80 target could act similarily to how I've defined it for the 65816.

@Diggsey
Copy link

Diggsey commented Nov 7, 2020

One problem with {u,i}size being 24-bits with the top 8 bits being zero is that arithmatic will be slower on those types.

@chorman0773
Copy link
Contributor

I mean, for usize it's a simple masked aritmetic. And for isize it's the same followed by a sign extension.

@RalfJung
Copy link
Member Author

RalfJung commented Nov 7, 2020

In the case of usize, whether or not any padding bits exist, the value of those padding bits, and whether or not particular padding bits are valid, would be unspecified (IE. they can be signed or zero extended, or left indeterminate/uinitialized).

The disadvantage of this idea is that now all code has to carry the cost of supporting exotic platforms. All unsafe code working with isize/usize has to be carefully audited to take into account the possibility of padding. In the past, some lang team members have expressed a preference to avoid such situations. (IIRC that was when people brought up DSPs where the smallest addressable unit is 16bit in size.) At some point, platforms are so niche to simply not warrant the cost this imposes.

This could be implementation-defined though, i.e. rustc could guarantee absence of padding for those targets where the pointer size is a power of two. That makes life easier for unsafe code authors on the majority of platforms, at the cost of people working on exotic platforms (they need to have special audits for unsafe code). Frankly, this is likely what would happen anyway, even of we officially left things unspecified -- I doubt most unsafe code authors are even aware that such platforms exist.

@chorman0773
Copy link
Contributor

chorman0773 commented Nov 7, 2020

I wouldn't be opposed to the increased difficulty, provided its not disproportionate to the cost of shifting the burden to unsafe developers, and rust can be written on these platforms. In particular I would really benefit from being able to use rust as a frontend language for SNES-Dev, a toolchain I am writing for compiling SNES homebrew, which is a 65816 target (and the origin of the ABI in question, though it can be generalized to any 65816 platform).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants