Support Chinese Character in $slug #979
-
Hi there, I don't know much about php. When I used a website code, with $slug, when I use Chinese Character as the file title, album title, these names become void or invalid. I go through all the codes, I found your codes were listed in the vendor folder. After checking /core/vendor/league/commonmark/src/Normalizer/SlugNormalizer.php I thought maybe your php code doesn't support Chinese Characters. Because when I use other function on the website, such as searching, it works. Not sure if what I mentioned above make sense. Thanks anyway. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
Hi there! Unfortunately I'm not very familiar with CJK scripts and best practices for "sluggifying" them. The default slug normalizer in this project relies on Unicode data to determine which characters should be kept or removed: commonmark/src/Normalizer/SlugNormalizer.php Lines 47 to 48 in c493585 Specifically, we only keep characters that fall into one of the following character classes:
It would seem that the CJK characters you're using don't fall into any of those categories :-/ I'd be open to changing this to include CJK characters somehow, so long as:
In the meantime, you can always replace the built-in slug normalizer with your own :) |
Beta Was this translation helpful? Give feedback.
-
Thanks very much for your reply.
I also use flarum. The recent updates support converting Chinese characters
to pinyin in the url. This way is more seo friendly too. As I said I don't
know much about code. But just in case if their work will be helpful to you.
You mentioned "At least two people can help verify that the updated
implementation looks correct", I will be happy to help in anyway. And I
believe I can easily find another person to test things out, if there is no
coding skills required. :)
I will take a look at replacing the built-in slug normalizer with your own
<https://commonmark.thephpleague.com/2.3/customization/slug-normalizer/> .
I try to make changes in the file SlugNormalizer.php
<https://github.com/thephpleague/commonmark/blob/c493585c130544c4e91d2e0e131e6d35cb0cbc47/src/Normalizer/SlugNormalizer.php#L47-L48>,
use other code I found on the website that should work for the Chinese
characters, but failed. I will check your docs today.
Thanks, Ashly
colinodell ***@***.***> 于 2023年1月24日周二 23:55写道:
… Hi there!
Unfortunately I'm not very familiar with CJK scripts and best practices
for "sluggifying" them. The default slug normalizer in this project relies
on Unicode data to determine which characters should be kept or removed:
https://github.com/thephpleague/commonmark/blob/c493585c130544c4e91d2e0e131e6d35cb0cbc47/src/Normalizer/
<https://github.com/thephpleague/commonmark/blob/c493585c130544c4e91d2e0e131e6d35cb0cbc47/src/Normalizer/SlugNormalizer.php#L47-L48>
SlugNormalizer.php#L47-L48
<https://github.com/thephpleague/commonmark/blob/c493585c130544c4e91d2e0e131e6d35cb0cbc47/src/Normalizer/SlugNormalizer.php#L47-L48>
Specifically, we only keep characters that fall into one of the following character
classes
<https://www.pcre.org/original/doc/html/pcrepattern.html#:~:text=The%20following%20general%20category%20property%20codes%20are%20supported%3A>
:
- \p{L} (letters)
- \p{Nd} (decimal numbers)
- \p{Nl} (letter numbers)
- \p{M} (marks)
- - (the literal - character)
It would seem that the CJK characters you're using don't fall into any of
those categories :-/
I'd be open to changing this to include CJK characters somehow, so long as:
- The regular expression doesn't become too complex for maintainers
like me (who lack the familiarity with CJK in Unicode)
- It follows best practices for users of those languages and produces
similar results as other sluggifiers
- At least two people can help verify that the updated implementation
looks correct
In the meantime, you can always replace the built-in slug normalizer with
your own
<https://commonmark.thephpleague.com/2.3/customization/slug-normalizer/>
:)
—
Reply to this email directly, view it on GitHub
<#955 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ARHWIAR6TPXWJHDGCXYPPBTWUCW5RANCNFSM6AAAAAAUFTBGXE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I have a Japanese website where I use Commonmark and I had a workaround for this for Commonmark version 1. However, after updating to version 2 and using PHP 8.2, I find that Japanese characters are evaluated properly and I don't need the workaround anymore. For example,
is now working with no special config. So, my guess is that PHP 8's handling of CJK improved at some point. |
Beta Was this translation helpful? Give feedback.
-
It looks like https://github.com/cocur/slugify has support for Chinese characters (Pinyin) so I'd recommend using that with our library (https://commonmark.thephpleague.com/2.3/customization/slug-normalizer/). |
Beta Was this translation helpful? Give feedback.
It looks like https://github.com/cocur/slugify has support for Chinese characters (Pinyin) so I'd recommend using that with our library (https://commonmark.thephpleague.com/2.3/customization/slug-normalizer/).