Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown blocks have trouble rendering unicode emoji #749

Closed
jameschensmith opened this issue Feb 2, 2023 · 13 comments · Fixed by #817
Closed

Markdown blocks have trouble rendering unicode emoji #749

jameschensmith opened this issue Feb 2, 2023 · 13 comments · Fixed by #817

Comments

@jameschensmith
Copy link

jameschensmith commented Feb 2, 2023

Summary

When using unicode emoji in a markdown block, the output appears to be getting trimmed. I tried to see if this was an issue with goldmark, but I don't think it is. I see that there is goldmark-emoji for :joy:-style emojis, but it didn't look like d2 was using this.

Issue appears to be for all layout engines.

Open to working on this if provided some guidance (would be my foray into the codebase).

Resources

D2 test file
Single emoji: |md
  🐵
|

Single emoji with text at beginning: |md
  This is a monkey emoji: 🐵
|

Single emoji with text at end: |md
  🐵 That was a monkey emoji.
|

Single emoji with text around: |md
  This is a monkey emoji: 🐵 It's pretty cool.
|

Multiple emojis: |md
  🐵🐵🐵🐵
|

Multiple emojis with text at beginning: |md
  These are monkey emojis: 🐵🐵🐵🐵
|

Multiple emojis with text at end: |md
  🐵🐵🐵🐵 Those were monkey emojis.
|

Multiple emojis with text around: |md
  These are monkey emojis: 🐵🐵🐵🐵 They are pretty cool.
|

Playground Link

Exported SVG & PNG

Exported SVG
Exported PNG

@cyborg-ts cyborg-ts added this to D2 Feb 2, 2023
@bo-ku-ra
Copy link
Contributor

bo-ku-ra commented Feb 3, 2023

is the following relevant?
#517
#513

@alixander
Copy link
Collaborator

@jameschensmith if you're open to looking into this, I would look here to start:

func MeasureMarkdown(mdText string, ruler *Ruler, fontFamily *d2fonts.FontFamily) (width, height int, err error) {

For Markdown to work, D2 has to manually measure everything and match it with CSS. For example, H1 tags have x padding, y line height, z margin.

LineHeight_h = 1.25

The way fonts work, we measure each glyph:

func (t *Ruler) MeasurePrecise(font d2fonts.Font, s string) (width, height float64) {

If you can reproduce this with labels, then it's not a markdown problem. But that might not be ascertainable, since we pad labels, so it won't have that noticeable cutoff.

I suspect the emoji unicode gets measured incorrectly. I'm not really sure what the correct measurement should be, whether it's constant or some unit size of the font-size.

Good luck and please let me know if you get stuck, happy to dig into it more with you!

@jameschensmith
Copy link
Author

Thanks, @alixander. Did a little more debugging this morning, and wanted to leave some more notes here. I did some more tests with other emojis, and the results are very strange. For instance, the cloud emoji (☁️) renders with proper width, as does a very complex emoji like "couple with heart: woman, man, medium-light skin tone" (👩🏼‍❤️‍👨🏼) which consists of multiple runes. Then you have the monkey emoji (along with others), and it gets cropped. Here's a simple playground showing two emojis measured differently. Also, I believe that this playground example shows that it may not be just markdown.

Basically, It doesn't look like it has to do with the amount of runes an emoji requires.

Unicode emoji resources:

@alixander
Copy link
Collaborator

oh interesting. maybe the font we use, Source Sans Pro, actually just has a subset of emojis supported.

adobe-fonts/source-sans#85

Maybe we need to package an emoji font like https://github.com/samuelngs/apple-emoji-linux, detect if the user used any emojis, and then inject that font into their SVG if so.

for _, t := range triggers {

@jameschensmith
Copy link
Author

jameschensmith commented Feb 5, 2023

Another update (things are getting clearer the more tests I run). The cloud emoji used in the last example is actually two runes. So basically, the emojis that are only one rune are the ones that get cut, whereas the emojis that are multiple runes can actually render too wide. I've provided an example below which shows emojis using 1-8 runes (ordered from smaller to larger). This may actually be a good regression test to add in the future.

Playground link

@alixander
Copy link
Collaborator

hah, wow interesting, til about emojis

@jameschensmith
Copy link
Author

This is pretty complicated 😅 I've read in some places that emojis are double-width characters, but also that not all emojis have the same width (I think flags are one of those exceptions). I also read here that best solution might be to check the font for the glyph width. In any case, I think one improvement to what exists now would at least be to treat characters that consist of sequences of runes as a single glyph. Then, treating the emoji glyphs as double-width, or some other solution could come after that.

@jameschensmith
Copy link
Author

If my understanding of the code is correct, atlas contains a mapping of ASCII codepoints (32 to 127 inclusive), and uses that mapping to draw the bounding box:

// ASCII is a set of all ASCII runes. These runes are codepoints from 32 to 127 inclusive.
var ASCII []rune
func init() {
ASCII = make([]rune, unicode.MaxASCII-32)
for i := range ASCII {
ASCII[i] = rune(32 + i)
}
}

And DrawRune replaces any non-ASCII codepoints with the unicode replacement character:

d2/lib/textmeasure/atlas.go

Lines 121 to 123 in 0da4e15

if !a.contains(r) {
r = unicode.ReplacementChar
}

So all non-ASCII codepoints (including emojis) have the same bounding box as the unicode replacement character. What we could do is render any unsupported character as the unicode replacement character, until those other characters are supported. This sounds like a bad idea, I know 🙈 There's quite a bit of problems involved here, as I've noted in previous comments.

@alixander
Copy link
Collaborator

ohh that makes sense. I don't think the shim is necessary though, better to overflow into the padding a bit imo.

My fear is just that even if we measure this "right", if it's not part of the font we use, and users on different machines will just fallback to their OS's emoji font, which will over/underflow anyway. I don't think there's an option other than having the TTF support a large corpus of emojis

@alixander
Copy link
Collaborator

Might have to implement this #638 first. Emoji packs might be huge to embed just for a 🙈

@bo-ku-ra
Copy link
Contributor

bo-ku-ra commented Feb 6, 2023

@alixander
Copy link
Collaborator

@bo-ku-ra oh wow, good find. this looks incredibly relevant

@bo-ku-ra
Copy link
Contributor

bo-ku-ra commented Feb 7, 2023

japanese uses double-byte characters.
so there seems to be a lot of information about it on japanese websites.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants