Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

int: reduce allocation by representing small ints as pointers #280

Merged
merged 2 commits into from
Jun 17, 2020

Conversation

alandonovan
Copy link
Contributor

No description provided.

This change defines low-level accessors for the small and big
arms of the int union so that the representation can be easily
changed in a follow-up.

Change-Id: I7c4ae279a6d2e7b76e102ba5d01a3cd1c56fb368
@alandonovan alandonovan requested a review from jayconrod June 16, 2020 21:01
@alandonovan alandonovan changed the title Int opt int: reduce allocation by representing small ints as pointers Jun 16, 2020
@@ -0,0 +1,33 @@
//+build !linux,!darwin
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pair of lines should be one line:

// +build !linux,!darwin !amd64,!arm64,!mips64x,!ppc64x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I always forget whether space means AND or OR.

package starlark

// Optimized Int implementation for 64-bit machines running POSIX (for mmap).
// It reserves a portion of the address space to represent int32 values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider expanding this:

It reserves a 4GB range of the virtual address space using mmap and represents int32 values
as addresses within that range. This disambiguates int32 values from big.Int pointers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

var smallints = reserveAddresses(1 << 32)

func reserveAddresses(len int) uintptr {
b, err := unix.Mmap(-1, 0, len, unix.PROT_READ, unix.MAP_PRIVATE|unix.MAP_ANONYMOUS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use something similar to NaN boxing instead? That would let you avoid the mmap.

amd64 processors only support 48-bit virtual addresses. They're sign-extended to 64-bits. There's a big range in the middle that could be used to store other values. I believe arm64 processors have a similar limitation. I don't know about mips.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't NaN-boxing the trick by which JS implementations embed 53-bit ints, 48-bit pointers, and other tagged value types into each of the many kinds of NaN? We're trying to hide numbers inside pointers, but Nan-boxing goes in the opposite direction: hiding pointers inside numbers, which is problematic for garbage collection. (big.Ints would falsely appear unreachable.)

The use of mmap is not such a problem: it is relatively portable (it's available as VirtualAlloc on Windows), and the current approach doesn't depend on features of the hardware other than word size.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just meant NaN boxing as an analogy: using the dead bits of a large type (in this case unsafe.Pointer) to store a smaller numeric value (int32).

Fair enough on mmap though. I'm still stuck in the dark ages of 32-bit where virtual address space was a scarce resource. :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you were willing to understand what "valid" pointers are on the host system, you could take a space from the "invalid" pointers by fiat, without asking mmap for address range. On amd64 the valid pointers start with 17 0s or 17 1s; if you pick a different prefix, like "01", then you could store up to 62-bit numbers in the remaining bits. But on non-amd64 that specific trick wouldn't work. On some 64-bit systems Go supports (I forget which one) the entire 64-bit range is technically valid. The mmap trick is more portable for sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to stick with the mmap version for now because it seems less fragile overall, but I will remember these ideas for when I implement my next proper compiler to object code. :)

This change defines a new representation for Int on 64-bit machines
running a POSIX operating system, by reserving a 4GB portion of the
address space.

Pointers to addresses in this region represent int32 values,
and are disjoint from all *big.Int pointers,
allowing us to represent the union in a single pointer.
This means the conversion from Int to Value does not allocate.

The gauss benchmark (added in this CL) shows -40% ns, -48% bytes, -63% allocs:
Benchmark/bench_gauss-12  	      84	  13648744 ns/op	 3175816 B/op	  105862 allocs/op (before)
Benchmark/bench_gauss-12  	      55	  24283703 ns/op	 6119844 B/op	  289862 allocs/op (after)

On 32-bit machines, or those running a non-POSIX system,
we continue to use the old representation.

Change-Id: I2915a8f8abff18ab2eba2891c352700c3ab0d4c4
var smallints = reserveAddresses(1 << 32)

func reserveAddresses(len int) uintptr {
b, err := unix.Mmap(-1, 0, len, unix.PROT_READ, unix.MAP_PRIVATE|unix.MAP_ANONYMOUS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just meant NaN boxing as an analogy: using the dead bits of a large type (in this case unsafe.Pointer) to store a smaller numeric value (int32).

Fair enough on mmap though. I'm still stuck in the dark ages of 32-bit where virtual address space was a scarce resource. :)

@alandonovan
Copy link
Contributor Author

I'm still stuck in the dark ages of 32-bit where virtual address space was a scarce resource. :)

It turns out there's quite a lot of spare address space just sitting there awaiting one's creativity. This change only squandered 1 part in 2^32 of it. :)

@alandonovan alandonovan merged commit c6daab6 into master Jun 17, 2020
@alandonovan alandonovan deleted the int-opt branch June 17, 2020 18:28
@benhoyt
Copy link

benhoyt commented Jun 30, 2020

@alandonovan This is fun! Out of interest, do you have performance numbers on how much this helped on the targeted platforms? (at least for the microbenchmarks you wrote)

@alandonovan
Copy link
Contributor Author

do you have performance numbers on how much this helped

Take a look at the commit message: 29194bd

@benhoyt
Copy link

benhoyt commented Jul 1, 2020

Aha, of course -- I was being too PR-centric. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants