Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial SKU search #797

Closed
PieterCappelle opened this issue Mar 5, 2018 · 17 comments
Closed

Partial SKU search #797

PieterCappelle opened this issue Mar 5, 2018 · 17 comments

Comments

@PieterCappelle
Copy link

A lot of issues in this great repository are about searching the SKU field. I'm currently working on Magento 2.2.3 & v2.5.4 of Elasticsuite and also have some questions about it. I think the Wiki or the Docs should be expanded with more information about this issue.

In 2.5.4 searching by SKU is working but I can't find in which behaviour exactly. Currently my config in elasticsuite_indices.xml is as following (standard)

<field name="sku" type="string">
    <isSearchable>1</isSearchable>
    <isUsedForSortBy>1</isUsedForSortBy>
    <isUsedInSpellcheck>1</isUsedInSpellcheck>
    <defaultSearchAnalyzer>whitespace</defaultSearchAnalyzer>
</field>

If I search by full SKU it's working correctly but when I try to search by partial SKU. It's not working. If I created some easy example: product with SKU M13A and a complete other name like TEST WITH SKU. I have rebuilded the cached and the indexes.

  • If you search for 13A you will not find the product.
  • If you search for M13 you will find the product.
  • If you search for M13A you will find the product.
  • If you search for TEST you will find the product.
  • If you search for SKU you will find the product.

The exact problem is the first one. It looks like searching from the beginning is working correctly, I did the test with multiple products and this is working correct. But when you are searching for partial parts of the SKU, the search function is not working.

Any ideas? Can you reproduce this?
Thanks in advance.

@southerncomputer
Copy link
Contributor

southerncomputer commented Mar 5, 2018

You know what I did because I could not work out how to set the analyzer separately from the (indexer) was just to load into META_KEYWORDS - a partial sku combination and use the standard analyzer against that field.

so with 3/4 digit prefix DASH mfgpartnumber = my sku ie. HEW-CB509A#ABA I simply load into meta_keywords HEW-CB, HEW-CB5, HEW-CB509, HEW-CB509A .. (and without the prefix too). comma separated so the analyzer can tell the parts apart.

Try it! You can manually create a couple of sku's with meta_keyword and see if it does what you seek.

This just worked for both types of searches (auto-complete and normal query), and if you want to you can simply create a sql query to inject into this field!

It may not be the most elegant way, but it has worked for me for over a year with 2.5Million sku's!

Basically the reference(with EDGE-NGRAM-FRONT) indexer could do the same thing, but it would do it automagically for you, then you could use the whitespace or spellcheck analyzer against those fields, but I never got elasticsuite to work in that asymetric way!

EDGE-NGRAM-FRONT breaks down a field into pieces like the way i manually inject my sku into meta_keyword, except i'm not sure if its smart enough to work around the prefix - dash schema i've created. See this link: https://qbox.io/blog/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams for more info

@PieterCappelle
Copy link
Author

Sounds good @southerncomputer but sound so wrong also. I'll wait until someone answers from Smile, maybe they have a better option. It should be possible by default. If they point me in the correct direction I'll create PR to fix this. I don't think I'm asking something strange, difficult yes maybe but logical user behaviour.

@sambolek
Copy link
Contributor

sambolek commented Mar 6, 2018

After seeing this comment, it was clear to me why partial search isn't working #710 (comment)

Following whats written in the comment, @southerncomputer suggestion (or something of the likes, but done programmatically) seems like the right way to handle this.

@afoucret
Copy link
Contributor

afoucret commented Mar 7, 2018

I suspect the spellchecking detection can be problematic here.

Can you try to change the following line in the Smile\ElasticsuiteCore\Search\Adapter\Elasticsuite\Spellchecker class :

$positionKey = sprintf("%s_%s", $token['start_offset'], $token['end_offset']);

by :

$positionKey = $token['position'];

If the test is OK, can you submit a PR ? ThenI will try to merge it for the next maintenance release.

I agree, you should not have to hack the engine as describe by @southerncomputer and the engine should be able to take care of it for you.

@PieterCappelle
Copy link
Author

PieterCappelle commented Mar 7, 2018

@afoucret I tried your case but it's not working. The result of statByPosition in function extractTermStatsByPoisition when I search for 13A sku is M13A. No results [] is the final response.

Array
(
    [0] => Array
        (
            [term] => 13
            [doc_freq] => 0
        )
    [1] => Array
        (
            [term] => a
            [doc_freq] => 4
            [analyzers] => Array
                (
                    [0] => standard
                    [1] => whitespace
                )

        )

)

When I check the response in function loadSpellingType then the result is SPELLING_TYPE_EXACT. I tried to change this to SPELLING_TYPE_FUZZY but is not working also.

@afoucret
Copy link
Contributor

afoucret commented Mar 9, 2018

Update : found how to fix it.

I need more time to :

  • Clean the code
  • Check if there is some impact on other part

The most probable is that we will have to wait 2.6.x to solve this one.

@PieterCappelle
Copy link
Author

Can I help?

@afoucret
Copy link
Contributor

afoucret commented Mar 9, 2018

The code is in the PR #810 and will be merged in the master branch quickly.
You can test it, but it require to switch on the master branch before (and having your env migrated to ElasticSearch 5.x).

I target the release 2.6.0 since we need to test it very carefully if we do not want to miss side effect on the relavance, precision and recall for non-sku searches.

@afoucret
Copy link
Contributor

afoucret commented Mar 9, 2018

The PR have been merged in the master branch.
@PieterCappelle waiting for your feedback if you want to test it

@PieterCappelle
Copy link
Author

Hi @afoucret, great work. I did the following:

First updated Elasticsearch to 6.X

sudo service elasticsearch stop
sudo apt-get --purge autoremove elasticsearch
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo apt-get install apt-transport-https
echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list
sudo apt-get update && sudo apt-get install elasticsearch
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install analysis-phonetic
sudo service elasticsearch start

curl localhost:9200 >> "number" : "6.2.2"

  1. Changed composer.json to dev-master instead of ^2.5.0 > composer update
  2. php bin/magento cache:clean
  3. php bin/magento index:reindex
  4. searched for 13A and got correct results :)

Then I created a new product with the SKU TEST-M13Abut when I search for this one I got an error

Exception #0 (Exception): Warning: call_user_func_array() expects parameter 1 to be a valid callback, first array member is not a valid class name or object in /var/www/xxx/vendor/smile/elasticsuite/src/module-elasticsuite-virtual-category/Model/Layer/Filter/Category.php on line 153

Thanks!

@afoucret
Copy link
Contributor

afoucret commented Mar 9, 2018

You should run :

  • php bin/magento cache:clean
  • php bin/magento setup:upgrade

And maybe clean the genarated folder.

Updating to ES 6.x is not mandatory. Only ES 2.x support have been dropped and ES 5.x is here to stay.

@PieterCappelle
Copy link
Author

PieterCappelle commented Mar 9, 2018

I did that, error is still throwing up.

Exception #0 (Exception): Warning: call_user_func_array() expects parameter 1 to be a valid callback, first array member is not a valid class name or object in /var/www/xxx/vendor/smile/elasticsuite/src/module-elasticsuite-virtual-category/Model/Layer/Filter/Category.php on line 153
#0 [internal function]: Magento\Framework\App\ErrorHandler->handler(2, 'call_user_func_...', '/var/www/xxx...', 153, Array)
#1 /var/www/xxx/vendor/smile/elasticsuite/src/module-elasticsuite-virtual-category/Model/Layer/Filter/Category.php(153): call_user_func_array(Array, Array)
#2 /var/www/xxx/vendor/smile/elasticsuite/src/module-elasticsuite-virtual-category/Model/Layer/Filter/Category.php(120): Smile\ElasticsuiteVirtualCategory\Model\Layer\Filter\Category->loadUsingCache('getSearchQuerie...')
#3 /var/www/xxx/vendor/smile/elasticsuite/src/module-elasticsuite-virtual-category/Model/Layer/Filter/Category.php(79): Smile\ElasticsuiteVirtualCategory\Model\Layer\Filter\Category->getFacetQueries()
#4 /var/www/xxx/vendor/smile/elasticsuite/src/module-elasticsuite-catalog/Block/Navigation.php(187): Smile\ElasticsuiteVirtualCategory\Model\Layer\Filter\Category->addFacetToCollection()
#5 /var/www/xxx/vendor/smile/elasticsuite/src/module-elasticsuite-catalog/Block/Navigation.php(174): Smile\ElasticsuiteCatalog\Block\Navigation->addFacets()
#6 /var/www/xxx/vendor/magento/framework/View/Element/AbstractBlock.php(272): Smile\ElasticsuiteCatalog\Block\Navigation->_prepareLayout()
#7 /var/www/xxx/vendor/magento/framework/View/Layout/Generator/Block.php(150): Magento\Framework\View\Element\AbstractBlock->setLayout(Object(Magento\Framework\View\Layout\Interceptor))
#8 /var/www/xxx/generated/code/Magento/Framework/View/Layout/Generator/Block/Interceptor.php(37): Magento\Framework\View\Layout\Generator\Block->process(Object(Magento\Framework\View\Layout\Reader\Context), Object(Magento\Framework\View\Layout\Generator\Context))
#9 /var/www/xxx/vendor/magento/framework/View/Layout/GeneratorPool.php(80): Magento\Framework\View\Layout\Generator\Block\Interceptor->process(Object(Magento\Framework\View\Layout\Reader\Context), Object(Magento\Framework\View\Layout\Generator\Context))
#10 /var/www/xxx/vendor/magento/framework/View/Layout.php(344): Magento\Framework\View\Layout\GeneratorPool->process(Object(Magento\Framework\View\Layout\Reader\Context), Object(Magento\Framework\View\Layout\Generator\Context))
#11 /var/www/xxx/generated/code/Magento/Framework/View/Layout/Interceptor.php(89): Magento\Framework\View\Layout->generateElements()
#12 /var/www/xxx/vendor/magento/framework/View/Layout/Builder.php(129): Magento\Framework\View\Layout\Interceptor->generateElements()
#13 /var/www/xxx/vendor/magento/framework/View/Page/Builder.php(55): Magento\Framework\View\Layout\Builder->generateLayoutBlocks()
#14 /var/www/xxx/vendor/magento/framework/View/Layout/Builder.php(65): Magento\Framework\View\Page\Builder->generateLayoutBlocks()
#15 /var/www/xxx/vendor/magento/framework/View/Page/Config.php(197): Magento\Framework\View\Layout\Builder->build()
#16 /var/www/xxx/vendor/magento/framework/View/Page/Config.php(207): Magento\Framework\View\Page\Config->build()
#17 /var/www/xxx/vendor/magento/framework/App/View.php(170): Magento\Framework\View\Page\Config->publicBuild()
#18 /var/www/xxx/vendor/magento/framework/App/View.php(114): Magento\Framework\App\View->loadLayoutUpdates()
#19 /var/www/xxx/vendor/magento/module-catalog-search/Controller/Result/Index.php(91): Magento\Framework\App\View->loadLayout()
#20 /var/www/xxx/vendor/magento/framework/Interception/Interceptor.php(58): Magento\CatalogSearch\Controller\Result\Index->execute()
#21 /var/www/xxx/vendor/magento/framework/Interception/Interceptor.php(138): Magento\CatalogSearch\Controller\Result\Index\Interceptor->___callParent('execute', Array)
#22 /var/www/xxx/vendor/smile/elasticsuite/src/module-elasticsuite-catalog/Plugin/CatalogSearch/ResultPlugin.php(98): Magento\CatalogSearch\Controller\Result\Index\Interceptor->Magento\Framework\Interception\{closure}()
#23 /var/www/xxx/vendor/magento/framework/Interception/Interceptor.php(135): Smile\ElasticsuiteCatalog\Plugin\CatalogSearch\ResultPlugin->aroundExecute(Object(Magento\CatalogSearch\Controller\Result\Index\Interceptor), Object(Closure))
#24 /var/www/xxx/vendor/magento/framework/Interception/Interceptor.php(153): Magento\CatalogSearch\Controller\Result\Index\Interceptor->Magento\Framework\Interception\{closure}()
#25 /var/www/xxx/generated/code/Magento/CatalogSearch/Controller/Result/Index/Interceptor.php(26): Magento\CatalogSearch\Controller\Result\Index\Interceptor->___callPlugins('execute', Array, NULL)
#26 /var/www/xxx/vendor/magento/framework/App/Action/Action.php(107): Magento\CatalogSearch\Controller\Result\Index\Interceptor->execute()
#27 /var/www/xxx/vendor/magento/framework/Interception/Interceptor.php(58): Magento\Framework\App\Action\Action->dispatch(Object(Magento\Framework\App\Request\Http))
#28 /var/www/xxx/vendor/magento/framework/Interception/Interceptor.php(138): Magento\CatalogSearch\Controller\Result\Index\Interceptor->___callParent('dispatch', Array)
#29 /var/www/xxx/vendor/magento/framework/Interception/Interceptor.php(153): Magento\CatalogSearch\Controller\Result\Index\Interceptor->Magento\Framework\Interception\{closure}(Object(Magento\Framework\App\Request\Http))
#30 /var/www/xxx/generated/code/Magento/CatalogSearch/Controller/Result/Index/Interceptor.php(39): Magento\CatalogSearch\Controller\Result\Index\Interceptor->___callPlugins('dispatch', Array, Array)
#31 /var/www/xxx/vendor/magento/framework/App/FrontController.php(55): Magento\CatalogSearch\Controller\Result\Index\Interceptor->dispatch(Object(Magento\Framework\App\Request\Http))
#32 /var/www/xxx/vendor/magento/framework/Interception/Interceptor.php(58): Magento\Framework\App\FrontController->dispatch(Object(Magento\Framework\App\Request\Http))
#33 /var/www/xxx/vendor/magento/framework/Interception/Interceptor.php(138): Magento\Framework\App\FrontController\Interceptor->___callParent('dispatch', Array)
#34 /var/www/xxx/vendor/magento/module-store/App/FrontController/Plugin/RequestPreprocessor.php(94): Magento\Framework\App\FrontController\Interceptor->Magento\Framework\Interception\{closure}(Object(Magento\Framework\App\Request\Http))
#35 /var/www/xxx/vendor/magento/framework/Interception/Interceptor.php(135): Magento\Store\App\FrontController\Plugin\RequestPreprocessor->aroundDispatch(Object(Magento\Framework\App\FrontController\Interceptor), Object(Closure), Object(Magento\Framework\App\Request\Http))
#36 /var/www/xxx/vendor/magento/module-page-cache/Model/App/FrontController/BuiltinPlugin.php(73): Magento\Framework\App\FrontController\Interceptor->Magento\Framework\Interception\{closure}(Object(Magento\Framework\App\Request\Http))
#37 /var/www/xxx/vendor/magento/framework/Interception/Interceptor.php(135): Magento\PageCache\Model\App\FrontController\BuiltinPlugin->aroundDispatch(Object(Magento\Framework\App\FrontController\Interceptor), Object(Closure), Object(Magento\Framework\App\Request\Http))
#38 /var/www/xxx/vendor/magento/framework/Interception/Interceptor.php(153): Magento\Framework\App\FrontController\Interceptor->Magento\Framework\Interception\{closure}(Object(Magento\Framework\App\Request\Http))
#39 /var/www/xxx/generated/code/Magento/Framework/App/FrontController/Interceptor.php(26): Magento\Framework\App\FrontController\Interceptor->___callPlugins('dispatch', Array, NULL)
#40 /var/www/xxx/vendor/magento/framework/App/Http.php(135): Magento\Framework\App\FrontController\Interceptor->dispatch(Object(Magento\Framework\App\Request\Http))
#41 /var/www/xxx/generated/code/Magento/Framework/App/Http/Interceptor.php(24): Magento\Framework\App\Http->launch()
#42 /var/www/xxx/vendor/magento/framework/App/Bootstrap.php(256): Magento\Framework\App\Http\Interceptor->launch()
#43 /var/www/xxx/index.php(39): Magento\Framework\App\Bootstrap->run(Object(Magento\Framework\App\Http\Interceptor))
#44 {main}```

@afoucret
Copy link
Contributor

afoucret commented Mar 9, 2018

I don't think this is related to the partial SKU search.
As you are using the master branch (which is unstable until 2.6.0), I think there is another problem here.

I will dig in to ensure the problem will not be in the release.

@afoucret
Copy link
Contributor

afoucret commented Mar 9, 2018

For me the main issue is solved.
Now you will have to wait for 2.6.0 for this to be released.

BR,

@facundocapua
Copy link

I'm not sure if I'm doing this fine, but I have installed version 2.5.5 of elasticsuite, and I've made a Patch with your PR, this is the patch:

From 6a41e0cec8a0b61f395ba1c5b1574751804c2fd2 Mon Sep 17 00:00:00 2001
From: Facundo Capua <fcapua@summasolutions.net>
Date: Tue, 20 Mar 2018 14:00:51 -0300
Subject: [PATCH] SKU partial match fix

--- vendor/smile/elasticsuite/src/module-elasticsuite-catalog/etc/elasticsuite_indices.xml
+++ vendor/smile/elasticsuite//src/module-elasticsuite-catalog/etc/elasticsuite_indices.xml
@@ -41,7 +41,7 @@
                     <isSearchable>1</isSearchable>
                     <isUsedForSortBy>1</isUsedForSortBy>
                     <isUsedInSpellcheck>1</isUsedInSpellcheck>
-                    <defaultSearchAnalyzer>whitespace</defaultSearchAnalyzer>
+                    <defaultSearchAnalyzer>reference</defaultSearchAnalyzer>
                 </field>
                 <field name="visibility" type="integer" />
                 <field name="children_ids" type="integer" />
--- vendor/smile/elasticsuite/src/module-elasticsuite-core/Search/Adapter/Elasticsuite/Spellchecker.php
+++ vendor/smile/elasticsuite/src/module-elasticsuite-core/Search/Adapter/Elasticsuite/Spellchecker.php
@@ -206,7 +206,7 @@ class Spellchecker implements SpellcheckerInterface
             if (in_array($analyzer, $analyzers)) {
                 foreach ($fieldData['terms'] as $term => $termStats) {
                     foreach ($termStats['tokens'] as $token) {
-                        $positionKey = sprintf("%s_%s", $token['start_offset'], $token['end_offset']);
+                        $positionKey = $token['position'];
 
                         if (!isset($termStats['doc_freq'])) {
                             $termStats['doc_freq'] = 0;
--- vendor/smile/elasticsuite/src/module-elasticsuite-core/Search/Request/Query/Fulltext/QueryBuilder.php
+++ vendor/smile/elasticsuite/src/module-elasticsuite-core/Search/Request/Query/Fulltext/QueryBuilder.php
@@ -107,13 +107,13 @@ class QueryBuilder
         $relevanceConfig = $containerConfig->getRelevanceConfig();
 
         $queryParams = [
-            'field'              => MappingInterface::DEFAULT_SEARCH_FIELD,
+            'fields'             => array_fill_keys([MappingInterface::DEFAULT_SEARCH_FIELD, 'sku'], 1),
             'queryText'          => $queryText,
             'cutoffFrequency'    => $relevanceConfig->getCutOffFrequency(),
             'minimumShouldMatch' => $relevanceConfig->getMinimumShouldMatch(),
         ];
 
-        return $this->queryFactory->create(QueryInterface::TYPE_COMMON, $queryParams);
+        return $this->queryFactory->create(QueryInterface::TYPE_MULTIMATCH, $queryParams);
     }
 
     /**
--- vendor/smile/elasticsuite/src/module-elasticsuite-core/etc/elasticsuite_analysis.xml
+++ vendor/smile/elasticsuite/src/module-elasticsuite-core/etc/elasticsuite_analysis.xml
@@ -38,7 +38,22 @@
         <filter name="shingle" type="shingle" language="default">
             <min_shingle_size>2</min_shingle_size>
             <max_shingle_size>2</max_shingle_size>
-            <output_unigrams>0</output_unigrams>
+            <output_unigrams>true</output_unigrams>
+        </filter>
+        <filter name="reference_shingle" type="shingle" language="default">
+            <min_shingle_size>2</min_shingle_size>
+            <max_shingle_size>10</max_shingle_size>
+            <output_unigrams>true</output_unigrams>
+            <token_separator></token_separator>
+        </filter>
+        <filter name="reference_word_delimiter" type="word_delimiter" language="default">
+            <generate_word_parts>true</generate_word_parts>
+            <catenate_words>false</catenate_words>
+            <catenate_numbers>false</catenate_numbers>
+            <catenate_all>false</catenate_all>
+            <split_on_case_change>true</split_on_case_change>
+            <split_on_numerics>true</split_on_numerics>
+            <preserve_original>false</preserve_original>
         </filter>
         <filter name="ascii_folding" type="asciifolding" language="default">
             <preserve_original>0</preserve_original>
@@ -146,11 +161,11 @@
     <analyzers>
         <analyzer name="standard" tokenizer="whitespace" language="default">
             <filters>
-                <filter ref="lowercase" />
                 <filter ref="ascii_folding" />
                 <filter ref="trim" />
-                <filter ref="elision" />
                 <filter ref="word_delimiter" />
+                <filter ref="lowercase" />
+                <filter ref="elision" />
                 <filter ref="standard" />
             </filters>
             <char_filters>
@@ -159,11 +174,24 @@
         </analyzer>
         <analyzer name="whitespace" tokenizer="whitespace" language="default">
             <filters>
+                <filter ref="ascii_folding" />
+                <filter ref="trim" />
+                <filter ref="word_delimiter" />
                 <filter ref="lowercase" />
+                <filter ref="elision" />
+            </filters>
+            <char_filters>
+                <char_filter ref="html_strip" />
+            </char_filters>
+        </analyzer>
+        <analyzer name="reference" tokenizer="standard" language="default">
+            <filters>
                 <filter ref="ascii_folding" />
                 <filter ref="trim" />
+                <filter ref="reference_word_delimiter" />
+                <filter ref="lowercase" />
                 <filter ref="elision" />
-                <filter ref="word_delimiter" />
+                <filter ref="reference_shingle" />
             </filters>
             <char_filters>
                 <char_filter ref="html_strip" />
@@ -171,11 +199,11 @@
         </analyzer>
         <analyzer name="shingle" tokenizer="whitespace" language="default">
             <filters>
-                <filter ref="lowercase" />
                 <filter ref="ascii_folding" />
                 <filter ref="trim" />
-                <filter ref="elision" />
                 <filter ref="word_delimiter" />
+                <filter ref="lowercase" />
+                <filter ref="elision" />
                 <filter ref="shingle" />
             </filters>
             <char_filters>
@@ -184,9 +212,9 @@
         </analyzer>
         <analyzer name="sortable" tokenizer="keyword" language="default">
             <filters>
-                <filter ref="lowercase" />
                 <filter ref="ascii_folding" />
                 <filter ref="trim" />
+                <filter ref="lowercase" />
             </filters>
             <char_filters>
                 <char_filter ref="html_strip" />
@@ -194,11 +222,11 @@
         </analyzer>
         <analyzer name="phonetic" tokenizer="whitespace" language="default">
             <filters>
-                <filter ref="lowercase" />
                 <filter ref="ascii_folding" />
                 <filter ref="trim" />
-                <filter ref="elision" />
                 <filter ref="word_delimiter" />
+                <filter ref="lowercase" />
+                <filter ref="elision" />
                 <filter ref="phonetic" />
             </filters>
             <char_filters>
-- 
2.14.1

In my case, the SKU I'm using is 412084300004:

  • Searching for 412084300004 works
  • Searching for 12084300004 does not work

I'll keep debugging and let you know my findings.

Thanks!

@facundocapua
Copy link

As stated by @PieterCappelle , I'm seeing the problem is with extractTermStatsByPoisition function.

Here is the result I'm getting for the examples stated above:

"spelling.phonetic": {
      "field_statistics": {
        "sum_doc_freq": 257997,
        "doc_count": 3819,
        "sum_ttf": 537369
      },
      "terms": {
        "12084300004": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 11
            }
          ]
        },
        "412084300004": {
          "doc_freq": 1,
          "ttf": 1,
          "term_freq": 1,
          "tokens": [
            {
              "position": 101,
              "start_offset": 12,
              "end_offset": 24
            }
          ]
        }
      }
    },
    "spelling": {
      "field_statistics": {
        "sum_doc_freq": 312175,
        "doc_count": 3819,
        "sum_ttf": 537369
      },
      "terms": {
        "12084300004": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 11
            }
          ]
        },
        "412084300004": {
          "doc_freq": 1,
          "ttf": 1,
          "term_freq": 1,
          "tokens": [
            {
              "position": 101,
              "start_offset": 12,
              "end_offset": 24
            }
          ]
        }
      }
    },
    "spelling.shingle": {
      "field_statistics": {
        "sum_doc_freq": 697775,
        "doc_count": 3819,
        "sum_ttf": 976939
      },
      "terms": {
        "12084300004": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 11
            }
          ]
        },
        "412084300004": {
          "doc_freq": 1,
          "ttf": 1,
          "term_freq": 1,
          "tokens": [
            {
              "position": 101,
              "start_offset": 12,
              "end_offset": 24
            }
          ]
        }
      }
    },
    "spelling.whitespace": {
      "field_statistics": {
        "sum_doc_freq": 319601,
        "doc_count": 3819,
        "sum_ttf": 537369
      },
      "terms": {
        "12084300004": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 11
            }
          ]
        },
        "412084300004": {
          "doc_freq": 1,
          "ttf": 1,
          "term_freq": 1,
          "tokens": [
            {
              "position": 101,
              "start_offset": 12,
              "end_offset": 24
            }
          ]
        }
      }
    }

@LiamKarlMitchell
Copy link

So I've got something like this.

curl -s -XPOST 'localhost:9200/magento2_default_catalog_product/_search?pretty&size=10000' -d '
{
    "query": {
        "wildcard" : { "sku.untouched" : "*123*" }
    }
}' | jq .hits.hits[]._source.sku

Ideally would map a sku.clean with only alpha numeric non analyzed/tokenized whatever its called.

Then wildcard search would work even if it starts or ends with letters. (Currently it has weird results)

<?xml version="1.0"?>
<indices xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:noNamespaceSchemaLocation="urn:magento:module:Smile_ElasticsuiteCore:etc/elasticsuite_indices.xsd">

    <index identifier="catalog_product" defaultSearchType="product">
        <type name="product" idFieldName="entity_id">
            <mapping>


 <field name="sku.clean" type="text">
    <isSearchable>1</isSearchable>
    <isUsedInSpellcheck>0</isUsedInSpellcheck>
    <isFilterable>0</isFilterable>
    <defaultSearchAnalyzer>cleansku</defaultSearchAnalyzer>
</field>

<analyzers>
	<analyzer name="cleansku" tokenizer="cleansku" language="default">
	    <filters>
	        <filter ref="lowercase" />
	        <filter ref="ascii_folding" />
	        <filter ref="trim" />
	        <filter ref="elision" />
	    </filters>
	    <char_filters>
	        <char_filter ref="html_strip" />
	        <char_filter ref="pattern_replace" pattern="[^a-zA-Z0-9 ]" replacement=""/>
	    </char_filters>
	</analyzer>
</analyzers>

Don't want it to be split up into ngrams as that seems to stuff up the partial search when using letters and numbers....

The hard part is implementing it, what files and plugin/method should I extend in my own module to add this into the catalog product search?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants