-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: Go 2: for index, rune, runelen = range for strings #28599
Comments
Wouldn't be the better if the size is computed explicitly as its fixed and does not change? Also, I don't think |
size here is the size of the rune that was decoded, its not fixed and changes from rune to rune. |
I'd rather fix this by doing the optimization in the compiler. |
Agreed, however unless the functions are very simple the compiler cant apply this optimization as any write to a byteslice or byte through a pointer needs to assume the byteslice iterated over was changed unless the compiler can prove they cant overlap. As far as i understand the compiler for example cant optimize the bytes.Map example as the mapping function could change the bytes of the byteslice being iterated over during the range loop. The other simplification this proposal provides is allowing to have the already computed rune size (step to the next index) in the loop be usable by code in the loop without calling utf8.DecodeRune(InString). Updated the proposal to reflect those points better. |
This should definitely not be done for edit - To add on to this, this would mean that doing |
I would not think this is that of an absolute counter point given the existence of utf8.DecodeRune and other utf8 function operating on byte slices and the general symmetry of byteslices and string in various packages like utf8 showing byte slices frequently being interpreted as utf8 encoded runes.
fair point that will indeed be confusing. Maybe something like |
That's because And of course the The point I'm trying to make is, bytes aren't special. They are just a uint8, and should be treated as such. If we want to introduce a concept of
Or |
The same is true for strings. https://golang.org/ref/spec#String_types "A string type represents the set of string values. A string value is a (possibly empty) sequence of bytes." but currently range UTF8 decodes runes from the bytes of a string. The range keyword introduces the way the string or byteslice is decoded even if both have no knowledge of this on their own.
Which is not the same semantic since changes to the byteslice are not taken into account during iteration and therefore often a copy is needed. |
I thought this would be brought up. I wasn't going to do it, but I was thinking that if you did I would mention that both ranging over a string and converting it to
There's not a single thing built-in to the language that assumes that a
This is a fair assessment, although I feel like in the case where you want it to be available to be changed, the
Anyway, the point I'm trying to make is that there's not a single feature in the language that assumes that a |
updated the proposal to the strings version only. |
Or if we do the bug where rune32 and int32 are separate types (#29012), then we could define But in general we're moving away from hard-coding UTF-8 assumptions from the language. |
If performance of this kind of operation is essential, we should teach the compiler to treat the relevant functions from the utf8 package as intrinsics. Ranging over a string is already somewhat non-orthogonal. Let's not make it even more different. |
UPDATED proposal to only apply to strings.
The proposal is to allow a new 3 value range for strings:
where index and rune are the same as in
for index, rune = range string
andrunelen
is the number of bytes that was needed to encode the rune in UTF8. In case a rune could not be decoded from the string due to an invalid UTF8 byte sequence: runelen is set to 1 and rune to utf8.RuneError.for i, r = range string
loop for the 3 value loop since internally the loop already computes the utf8 byte sequence width of the rune to correctly advance the index on the next iteration. This will however allow user supplied code in the loop to use this hidden variable.For example current strings.Map first range loop:
could become:
The text was updated successfully, but these errors were encountered: