Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editorial: Rename Unicode encoding/decoding operations #2014

Merged
merged 1 commit into from
Aug 11, 2020

Conversation

bakkot
Copy link
Contributor

@bakkot bakkot commented May 25, 2020

Fixes #1863 following @domenic's suggestion there. I've also renamed UTF16DecodeSurrogatePair to UTF16SurrogatePairToCodePoint and UTF16DecodeString to UTF16CodeUnitsToCodePoints. I'm open to other names; these seemed the clearest.

After just renaming, some of the wording became a little strange: "the UTF16CodePointToCodeUnits of a Unicode code point" reads more awkwardly to me than "the UTF16Encoding of a Unicode code point" did. So I rephrased those usages.

cc @jmdyck

EDIT: names changed.

@bakkot bakkot changed the title Rename unicode ops Editorial: Rename Unicode encoding/decoding operations May 25, 2020
@jmdyck
Copy link
Collaborator

jmdyck commented May 26, 2020

Hm. The names UTF16CodePoint[s]ToCodeUnits suggest that there's such a thing as a "UTF16 code point", which there isn't. Maybe CodePoint[s]ToUTF16CodeUnits.

@bakkot
Copy link
Contributor Author

bakkot commented May 26, 2020

Ah, I'd intended it to read as "the UTF-16 version of code point to code unit translation", but I can see the ambiguity there.

@jmdyck
Copy link
Collaborator

jmdyck commented May 26, 2020

(It's tempting to moreover suggest replacing UTF16CodeUnits with just String:

  • CodePointToString
  • CodePointsToString
  • StringToCodePoints

but I haven't actually determined if that's a good suggestion. It does seem more readable.)

@mathiasbynens
Copy link
Member

Came here to say what @jmdyck already said: UTF16CodePointToCodeUnitsCodePoint[s]ToUTF16CodeUnits, at a minimum. That said, I like the suggestion to just use String even better.

@bakkot
Copy link
Contributor Author

bakkot commented May 26, 2020

I am concerned about saying String instead of CodeUnits because the spec currently makes a distinction between those: StringValue of StringLiteral, for example, says

Return the String value whose code units are the SV of this StringLiteral.


Actually, I guess that means UTF16CodePointsToCodeUnits and UTF16CodeUnitsToCodePoints are bad names, because those do return and take a String value respectively.

So, thoughts on

?

@jmdyck
Copy link
Collaborator

jmdyck commented May 26, 2020

I am concerned about saying String instead of CodeUnits because the spec currently makes a distinction between those: StringValue of StringLiteral, for example, says

Return the String value whose code units are the SV of this StringLiteral.

Yeah, but I think that's a pointless distinction. See Issue #828.

(I started a PR to dissolve that distinction, but something stopped me. I'll have another look.)

@bakkot bakkot force-pushed the rename-unicode-ops branch from 5059472 to 3512416 Compare May 27, 2020 03:48
@bakkot
Copy link
Contributor Author

bakkot commented May 27, 2020

Force-pushed commits using the names above.

@jmdyck

Yeah, but I think that's a pointless distinction.

Arguably, but if we do away with it we should do so holistically. We can rename CodePointToUTF16CodeUnits to CodePointToString at that point.

 - rename UTF16Encoding to CodePointToUTF16CodeUnits
 - reword some usages of CodePointToUTF16CodeUnits
 - rename UTF16Encode to CodePointsToString
 - rename UTF16DecodeSurrogatePair to UTF16SurrogatePairToCodePoint
 - rename UTF16DecodeString to StringToCodePoints
@ljharb ljharb force-pushed the rename-unicode-ops branch from 3512416 to 29fedc2 Compare August 11, 2020 03:47
@ljharb ljharb merged commit 29fedc2 into master Aug 11, 2020
@ljharb ljharb deleted the rename-unicode-ops branch August 11, 2020 03:51
h2oche pushed a commit to h2oche/ecma262 that referenced this pull request Aug 16, 2020
 - rename UTF16Encoding to CodePointToUTF16CodeUnits
 - reword some usages of CodePointToUTF16CodeUnits
 - rename UTF16Encode to CodePointsToString
 - rename UTF16DecodeSurrogatePair to UTF16SurrogatePairToCodePoint
 - rename UTF16DecodeString to StringToCodePoints
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

editorial nit: having both UTF16Encode and UTF16Encoding is weird
6 participants