Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String.is_subsequence_of is either broken or has a very misleading name #37063

Closed
distractedmosfet opened this issue Mar 15, 2020 · 5 comments · Fixed by #68838 · May be fixed by #37066
Closed

String.is_subsequence_of is either broken or has a very misleading name #37063

distractedmosfet opened this issue Mar 15, 2020 · 5 comments · Fixed by #68838 · May be fixed by #37066

Comments

@distractedmosfet
Copy link

distractedmosfet commented Mar 15, 2020

Godot version:
3.2

OS/device including version:
All versions

Issue description:
is_subsequence_of returns true is all characters from string A appear in string B, in the order in string A, but it doesn't care if characters not in A are interwoven. So:

"save5".is_subsequence_of("save35")

returns true.

This seemed like such an odd behaviour it feels like a bug? I suspect there's a chance this is actually intended behaviour, used as sort of a fuzzy search within the engine itself. I feel like if that's the case this should probably be renamed, or at least documented.

Steps to reproduce:
Above line of code should do the trick.

@dreamsComeTrue
Copy link
Contributor

Instead of renaming or documenting - I've combined both of those suggestions, and overloaded that method with additional parameter, which allow you to specify, if you want to go without any gaps - check attached PR.

@yvie-k
Copy link
Contributor

yvie-k commented Mar 15, 2020

The name is not misleading, it's the correct behaviour for a subsequence. What you expected is a substring.
The docs should rather clarify the difference between substring and subsequence than "patching" it with an additional parameter.
IMO, creating a String.is_substring_of would be more clear than having this magic parameter

dreamsComeTrue added a commit to dreamsComeTrue/godot that referenced this issue Mar 15, 2020
@dreamsComeTrue
Copy link
Contributor

IMO, creating a String.is_substring_of would be more clear than having this magic parameter

I've modified my PR according to your request, I agree it's nicer to separate both of them.
BTW: didn't have ANY idea that 'substring', and 'subsequence' should be different in implementation (now I wonder, how it's being handled in various programming languages 😄 )

@distractedmosfet
Copy link
Author

A separate is_substring_of method and a clarification on subsequence is a pretty good improvement. I will point out that this will still almost certainly end up with people misusing it, who simply didn't read the fine print. Maybe mathematicians use the word subsequence to mean a specific thing, but that doesn't define how language is used and understood by people outside of the field, which is the vast majority of Godot users. It's a question of UX not academics.

However, I certainly don't care about this small quirk enough to seriously champion further change, but I feel compelled to point out "That is still bad UX, but it's at least better than what it was." What the Godot dev team does with that is up to you guys.

@follower
Copy link
Contributor

This is an interesting case study in feature discoverability and (rather appropriately) fuzzy matching...

The four six(?) existing is_substring_of() methods

Funnily enough, currently for the String class in GDScript (without a is_substring_of() ) there are no less than four ways to determine if one string is a substring of another (not including regular expressions):

    print("bar" in "foobarbaz") # True
    print("foobarbaz".find("bar") > -1) # True
    print("foobarbaz".match("*bar*")) # True
    print("bar".is_subsequence_of("foobarbaz")) # True

[Update: Oh, I missed the two count() variants. :D ]

And, three of them work as one might expect for non-continuous matches:

    print("orb" in "foobarbaz") # False
    print("foobarbaz".find("orb") > -1) # False
    print("foobarbaz".match("*orb*")) # False
    print("orb".is_subsequence_of("foobarbaz")) # True

And, for completeness:

    print("rbo" in "foobarbaz") # False
    print("foobarbaz".find("rbo") > -1) # False
    print("foobarbaz".match("*rbo*")) # False
    print("rbo".is_subsequence_of("foobarbaz")) # False

But which method is "most Godotic"? :)

Given GDScript's Python influence I suspect that the in form might be considered "most Godotic" but it's not even mentioned in the current String class documentation nor the GDScript basics . (Aside: in is also a difficult search term to use. :D )

Given that the only other method with sub in the name is substr() it's probably not surprising is_subsequence_of() gets used by mistake.

Suggestions

  • Documentation for in could be added to String class docs & GDScript basics tutorial.

  • It could be made more clear in is_subsequence_of() documentation that "subsequence" is not the same as "substring".

  • FWIW I found the phrase "Gaps or interwoven characters" somewhat unclear (mostly the use of "interwoven" which isn't a particularly common term) and wonder if an alternate wording might be clearer.

    The Wikipedia "Subsequence" page linked above expresses it as:

    [A subsequence] can be derived from another sequence by deleting some or no elements without changing the order of the remaining elements

    Which is a bit formal and not ideal...but made me wonder about something like (or more about "exact match" vs "inexact match"):

    String A is a subsequence of String B if all the letters in String A are in String B and in the same order but not necessarily all next to each other.

    However, I think it's less important to describe what a subsequence is than what it isn't (in relation to a substring) so something more like this might be appropriate (I think the mention of "fuzzy" searches in Add String.is_substring_of to check for consecutive match #37066 as a use case was a good idea):

    Returns [code]true[/code] if this string is a subsequence of the given string.

    Note: You probably want to find a substring so use [insert substring method/approach here].

    A subsequence is not the same as a substring as it allows for non-exact matches which can be useful for "fuzzy" searches.

  • I'm undecided whether adding an additional is_substring_of() method is warranted but given code auto-complete, it enables discovery of the correct method more than documentation can.

Related links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants