Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial SKU search #1536

Open
LiamKarlMitchell opened this issue Sep 9, 2019 · 8 comments
Open

Partial SKU search #1536

LiamKarlMitchell opened this issue Sep 9, 2019 · 8 comments

Comments

@LiamKarlMitchell
Copy link

LiamKarlMitchell commented Sep 9, 2019

Partial SKU search is something that has been considered not to implement due to performance reasons. Previous tickets talking about this appear to have been closed, this is unsatisfactory.

Could the stance on this please be reconsidered?

Currently searching for combinations of letters and numbers in sequence does not return correct results in all circumstances.

The problem seems to be some kind of tokenizing/not merging alpha numeric.

Describe the solution you'd like
Searching for partial sku should return results regardless of dashes or alpha numeric combinations.

Example query with wildcard.

curl -s -XPOST 'localhost:9200/magento2_default_catalog_product/_search?pretty&size=10000' -d '
{
    "query": {
        "wildcard" : { "sku.untouched" : "*1234*" }
    }
}' | jq .hits.hits[]._source.sku

Products indexed with SKU

MT-1023
MT102425
MT1022535AB
MT1022435-AB

Searching for 102, MT102, T102 should show the results.
Searching for 35AB should also work.

The SKU wildcard could only be done on the first part of the search term delimited by space.
So searching for "1234 Watch" would return something that contains a 1234 in its sku wildcard 1234 and has Watch in its name for example.

Describe alternatives you've considered
mirasvit has been considered, but we can't do custom numeric range sliders in layered navigation product search.

A mapping to a clean version of the SKU with special characters removed only alpha numeric would be ideal.

Further direction on how to go about adding this in would be greatly appreciated, could be a custom extension.

Related
#710
#797

@romainruaud
Copy link
Collaborator

Hello @LiamKarlMitchell

if you did not read it previously, I can suggest you to have a look on the Holy Bible of searching by SKU, written by @rbayet here : https://github.com/Smile-SA/elasticsuite/wiki/SearchingBySkuBasics

Jokes aside, your implementation could be problematic depending to the store business it's used on : let's say I'm running a video store, and someone is searching for the film "a dog's life", you'd catch any SKU containing the letter A.

We have had a lot of thoughts about this topic, and recently I tend to think that searching by SKU is not a common usecase for B2C, rather for B2B websites (but I might be wrong).

If it's the case, it could be a good idea to ship this as an optional extension.

On the wiki you'll also be able to read where is the starting point to add new Query types into the engine : https://github.com/Smile-SA/elasticsuite/wiki/Querying#extending-the-query-and-aggregation-factory

I'd be happy if you manage to propose a PR for the support of the Wildcard query. With the support of this query inside Elasticsuite, you'd be able to use it for your own needs.

Regards

@southerncomputer
Copy link
Contributor

I have found disabling spellcheck and phonetic search to increase the hit rate on sku's for some reason not sure why!

@LiamKarlMitchell
Copy link
Author

LiamKarlMitchell commented Sep 24, 2019

Yes, mostly B2B that wants it, although one client sells to everyone that wants parts and the parts are often just looked up by SKU but people might not always have an exact match e.g. barcode or marking on original part got damaged somehow which would make fuzzy searching or spell check useful too..

Only do SKU search when given a SKU, if they have spaces in their input don't do sku search.
Letters and numbers possibly delimited by - or . or () (Just filter those out of the sku)?

Interesting thanks.

@bernd-reindl
Copy link

bernd-reindl commented Dec 19, 2019

Why you did not add a configuration flag per attribute?
Searchmethod "is": Split searchterm and perform a search like now implemented.
Searchmethod "like": Do not split the searchterm and search for whole searchterm with wildcards. *SKU-123-AB*

So you can configure the sku attribute with "like" and the description attribute with "is".

This is not only a problem on sku. Also for ISBN oder EAN codes.

@southerncomputer
Copy link
Contributor

southerncomputer commented Dec 19, 2019

The autocomplete search suggestion accurately returns hits on my sku - just need a way to forward those results to tag onto the main search results! or like this: https://amasty.com/docs/lib/exe/fetch.php?media=magento_2:elastic_search:wildcard-spell.mp4 from https://amasty.com/docs/doku.php?id=magento_2:elastic_search#advanced_query_settings

@sedax90
Copy link

sedax90 commented Jan 29, 2020

just for completeness, I managed to solve the problem like this:

  1. Create a custom module
  2. Create a file: etc/elasticsuite_indices.xml with:
<?xml version="1.0"?>
<indices xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:noNamespaceSchemaLocation="urn:magento:module:Smile_ElasticsuiteCore:etc/elasticsuite_indices.xsd">

    <index identifier="catalog_product" defaultSearchType="product">
        <type name="product" idFieldName="entity_id">
            <mapping>
                <!--<field name="sku" type="text">
                    <isSearchable>1</isSearchable>
                    <isUsedInSpellcheck>1</isUsedInSpellcheck>
                    <isFilterable>1</isFilterable>
                    <defaultSearchAnalyzer>partial_custom_analyzer</defaultSearchAnalyzer>
                </field>-->

                <field name="search" type="text">
                    <isSearchable>1</isSearchable>
                    <isUsedInSpellcheck>1</isUsedInSpellcheck>
                    <isFilterable>1</isFilterable>
                    <defaultSearchAnalyzer>partial_custom_analyzer</defaultSearchAnalyzer>
                </field>
            </mapping>
        </type>
    </index>
</indices>
  1. Create a file: etc/elasticsuite_analysis.xml with:
<?xml version="1.0"?>
<analysis xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:noNamespaceSchemaLocation="urn:magento:module:Smile_ElasticsuiteCore:etc/elasticsuite_analysis.xsd">
    <filters>
        <filter name="ngram_filter_custom" type="edge_ngram" language="default">
            <min_gram>3</min_gram>
            <max_gram>20</max_gram>
        </filter>
    </filters>

    <analyzers>
        <analyzer name="partial_custom_analyzer" tokenizer="standard" language="default">
            <filters>
                <filter ref="ascii_folding" />
                <filter ref="trim" />
                <filter ref="word_delimiter" />
                <filter ref="lowercase" />
                <filter ref="elision" />
                <filter ref="standard" />
                <filter ref="ngram_filter_custom"/>
            </filters>
            <char_filters>
                <char_filter ref="html_strip"/>
            </char_filters>
        </analyzer>
    </analyzers>
</analysis>
  1. Clear cache and reindex

My problem was on name field, but i've created a sku field config for future possibilities. I hope I have helped you.

@brucemead
Copy link

I've been having a similar issue. Example: Searching EBX39 where product name is EPSON EB-X39 Projector.

The only way I've been able to bring back these results without impacting the accuracy in other places is adding a new char_filter as so in my etc/elasticsuite_analysis.xml:

    <char_filters>
        <char_filter name="special_characters" type="pattern_replace" language="default">
            <pattern>[^A-Za-z0-9 ]</pattern>
            <replacement></replacement>
        </char_filter>
    </char_filters>

Credit to https://www.javacodegeeks.com/2018/03/elasticsearch-ignore-special-characters-query-pattern-replace-filter-custom-analyzer.html

@LiamKarlMitchell
Copy link
Author

We ended up making a sku_search attribute which had special characters removed.
Maybe char_filter is nicer thanks for sharing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants