Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort with ignore case #347

Closed
Aldjinn opened this issue Jun 17, 2024 · 8 comments
Closed

Sort with ignore case #347

Aldjinn opened this issue Jun 17, 2024 · 8 comments
Milestone

Comments

@Aldjinn
Copy link

Aldjinn commented Jun 17, 2024

Is your feature request related to a problem? Please describe.
I would like to be able to sort from A-Z or Z-A but with ignore case. I have entries starting with UPPER and lower case, so (for me) it would be nice to ignore it during sort.

Describe the solution you'd like
A checkbox to ignore the case during sort right next to A-Z and Z-A.

grafik

@Bubka
Copy link
Owner

Bubka commented Jun 18, 2024

Hi,
I've recently realized that sorting is case sensitive, it's already on my to-do list 👍🏻

@Bubka Bubka moved this from Todo to Done in 2FAuth backlog Jun 20, 2024
@Bubka Bubka added this to the v5.3.0 milestone Jun 21, 2024
@yheuhtozr
Copy link

The current implementation is not even "case-sensitive", it is just Unicode code point sorting. Both case-sensitive and -insensitive are language-dependent, so I recommend using a.localeCompare(b, <app's display language>).

const list = ["SASU", "säsä", "SISU", "sisi", "TATA", "ZÖZU", "zyzy"];

list.toSorted((a, b) => a > b ? 1 : -1); // current
// => ["SASU","SISU","TATA","ZÖZU","sisi","säsä","zyzy"]
list.toSorted((a, b) => a.toLowerCase() > b.toLowerCase() ? 1 : -1); // simple lowercase
// => ["SASU","sisi","SISU","säsä","TATA","zyzy","ZÖZU"]

const englishName = new Intl.DisplayNames(["en"], {type: "language"});
Object.fromEntries(["en", "fr", "sv", "et", "tr"].map((ln) => [englishName.of(ln), list.toSorted((a, b) => a.localeCompare(b, ln))]));
// {
//   "English":  ["säsä","SASU","sisi","SISU","TATA","ZÖZU","zyzy"],
//   "French":   ["säsä","SASU","sisi","SISU","TATA","ZÖZU","zyzy"],
//   "Swedish":  ["SASU","sisi","SISU","säsä","TATA","zyzy","ZÖZU"],
//   "Estonian": ["SASU","sisi","SISU","säsä","ZÖZU","zyzy","TATA"],
//   "Turkish":  ["säsä","SASU","SISU","sisi","TATA","ZÖZU","zyzy"]
// }

@Bubka
Copy link
Owner

Bubka commented Jul 4, 2024

Thx for the suggestion 👍🏻

@Bubka Bubka moved this from Done to In Progress in 2FAuth backlog Jul 29, 2024
Bubka added a commit that referenced this issue Aug 2, 2024
@Bubka Bubka moved this from In Progress to Done in 2FAuth backlog Aug 2, 2024
@yheuhtozr
Copy link

yheuhtozr commented Aug 13, 2024

Hi, thank you for your work, but I am a little confused when I just checked the implementation. Do you want, for example, ["ac","Ab","AB","Ac","ab","AC"] to be sorted like ["Ab","AB","Ac","AC","ab","ac"] (when case-sensitive sort is on)?

Repository owner deleted a comment from muhammedtur Sep 9, 2024
Bubka added a commit that referenced this issue Sep 11, 2024
@Bubka
Copy link
Owner

Bubka commented Sep 11, 2024

@yheuhtozr thx for your attention.

I just pushed another version of the sort function: The original implementation is back but with accented characters treated as non-accented characters. It's not locale specific sorting but it's better than having them listed after non-accented uppercase items.

The case-insensitive sorting remains untouched, it uses localeCompare so the sorting is ok. Any opinion on uppercase or lowercase first?

@Bubka Bubka closed this as completed in 2f05f49 Sep 27, 2024
@Bubka Bubka moved this from Done to Released in 2FAuth backlog Sep 27, 2024
@yheuhtozr
Copy link

yheuhtozr commented Oct 2, 2024

@Bubka Sorry for being away, and being confused in conversation myself, but there look like a lot of concepts having been mixed up under the name of "case sensitivity", so I am not exactly sure what is the goal you actually intends for.

Let's call them by different names:

  • Case axis
    • case-merged: e.g. [A=a, B=b, ...] (so AsC > ASZ > AZa)
    • case-unmerged: e.g. [A, a, B, b, ...] (so ASZ > AsC > AZa)
    • case-separated: e.g. [A, B, ..., a, b, ...] (so ASZ > AZa > AsC)
    • case-unsorted: not having exact rules for casing
  • Variant axis
    • variant-merged: e.g. [a=á, e=é, ...]
    • variant-unmerged: e.g. [a, á, e, é, ...]
    • variant-separated: e.g. [a, e, ..., á, é, ...] (← but few people want this one, I guess)
    • variant-unsorted: not having exact rules for variants
  • Locale axis
    • locale-independent: order not changed by locale
    • locale-dependent: order can change by locale

So, we have tons of possible options, but first, we were not clear which "case-sensitive" means, case-unmerged or case-separated as I named.

  • The original implementation (simple code point sorting) was:

    • locale-independent, and
    • looks case-separated in the ASCII range only, otherwise mostly case-unsorted, and
    • variant-unsorted (because it never affects ASCII)
  • The "case-insensitive" branch in 5d3a1be and I suggested was:

    • locale-dependent, and
    • mostly case-merged: localeCompare actually does something smart that first tries case-merged sort, then does case-unmerged sort among those which returned no difference
    • mostly variant-merged: same as above

And I'm not sure what you tried to do with "case-sensitive" branches in 5d3a1be and d90ffd5.

  • Does 5d3a1be perhaps want to achieve something like locale-dependent, case-separated only for Latin letters or otherwise case-merged, and variant-merged version? But if so, not seem to working correctly.
  • Does d90ffd5 perhaps aim for a version locale-independent, case-separated for Latin letters, and variant-merged for Latin letters?
    • but it'd give wrong alphabetical orders for most non-Russian Cyrillic languages and variant-unsorted for most of non-European languages

Generally, I don't know much what aspect and what degree of "case-sensitive" behavior of old implementation you are thinking to retain.

Do you want to mix old behavior and locale support in any combination? Do you actually want to have more than two sorting options? Do you perhaps need some help in creating other "case-sensitive" version using localeCompare?

@Bubka
Copy link
Owner

Bubka commented Oct 25, 2024

Thx very much @yheuhtozr for the axis decomposition, it's very useful. And I must admit, I hadn't thought of all those combinations 🤯

What I was aiming for with the case-sensitive branch is case-separated + variant-merged + local_dependant. I haven't found a way to achieve this with a single localeCompare call (is it even possible/relevant for each language?!) so I gave up on the local_dependant axis. For me, and as a french (latin) native speaker, the Case axis should also be case_separated. local_dependant should be applied too to stick as much as possible to each user's culture. I don't mind the Variant axis as long as it's not variant-separated. On top of that, I want to keep it simple, so no more sorting option, and it would be nice to have an algorithm that is not too complicated 😇 😄

Any help would be appreciated.

@yheuhtozr
Copy link

yheuhtozr commented Nov 10, 2024

What I was aiming for with the case-sensitive branch is case-separated + variant-merged + local_dependant. I haven't found a way to achieve this with a single localeCompare call (is it even possible/relevant for each language?!) so I gave up on the local_dependant axis.

Yes..., in a bigger picture, there is no idiomatic way to write case-separated sort in general (you can only loop them manually). Which means you handle some part of low-level Unicode works yourself when you want it. So a sample code that satisfies your description would be like:

let locale; // <- assume this is the current display language
let namesToSort; // <- assume this is an array that contains names

// sorter for reuse
// maybe you can put it inside the function if the sort only perform once every page
let caseSensitiveSorter = new Intl.Collator(locale, {
  sensitivity: "case", // <- focus on case comparison
  caseFirst: "upper", // <- make sure 'A' > 'a' (not 'a' > 'A')
});

// locale-dependent segmenter
// maybe you can put it inside the function
// Firefox only supports at >= 125 so you can fall back to simple [...string] if with any concern
let segmenter = new Intl.Segmenter(locale, {granularity: "grapheme"});

// sort function
// I don't know what is optimal but it returns a new sorted list based on input list
function caseSensitiveSort(list, sorter, segmenter) {
  let segList = list.map((e) => [...segmenter.segment(e)].map((s) => s.segment));
  segList.sort(function(a, b) {
    for (let i = 0; true; i++) {
      const result = sorter.compare(a[i] || '', b[i] || '');
      if (result !== 0 || !a[i] || !b[i]) { return result }
    }
  });
  return segList.map((e) => e.join(''));
}

let sortedNames = caseSensitiveSort(namesToSort, caseSensitiveSorter, segmenter); // execute sorting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Released
Development

No branches or pull requests

3 participants