-
Notifications
You must be signed in to change notification settings - Fork 340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial SKU search #797
Comments
You know what I did because I could not work out how to set the analyzer separately from the (indexer) was just to load into META_KEYWORDS - a partial sku combination and use the standard analyzer against that field. so with 3/4 digit prefix DASH mfgpartnumber = my sku ie. HEW-CB509A#ABA I simply load into meta_keywords HEW-CB, HEW-CB5, HEW-CB509, HEW-CB509A .. (and without the prefix too). comma separated so the analyzer can tell the parts apart. Try it! You can manually create a couple of sku's with meta_keyword and see if it does what you seek. This just worked for both types of searches (auto-complete and normal query), and if you want to you can simply create a sql query to inject into this field! It may not be the most elegant way, but it has worked for me for over a year with 2.5Million sku's! Basically the reference(with EDGE-NGRAM-FRONT) indexer could do the same thing, but it would do it automagically for you, then you could use the whitespace or spellcheck analyzer against those fields, but I never got elasticsuite to work in that asymetric way! EDGE-NGRAM-FRONT breaks down a field into pieces like the way i manually inject my sku into meta_keyword, except i'm not sure if its smart enough to work around the prefix - dash schema i've created. See this link: https://qbox.io/blog/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams for more info |
Sounds good @southerncomputer but sound so wrong also. I'll wait until someone answers from Smile, maybe they have a better option. It should be possible by default. If they point me in the correct direction I'll create PR to fix this. I don't think I'm asking something strange, difficult yes maybe but logical user behaviour. |
After seeing this comment, it was clear to me why partial search isn't working #710 (comment) Following whats written in the comment, @southerncomputer suggestion (or something of the likes, but done programmatically) seems like the right way to handle this. |
I suspect the spellchecking detection can be problematic here. Can you try to change the following line in the $positionKey = sprintf("%s_%s", $token['start_offset'], $token['end_offset']); by : $positionKey = $token['position']; If the test is OK, can you submit a PR ? ThenI will try to merge it for the next maintenance release. I agree, you should not have to hack the engine as describe by @southerncomputer and the engine should be able to take care of it for you. |
@afoucret I tried your case but it's not working. The result of
When I check the response in function |
Update : found how to fix it. I need more time to :
The most probable is that we will have to wait 2.6.x to solve this one. |
Can I help? |
The code is in the PR #810 and will be merged in the master branch quickly. I target the release 2.6.0 since we need to test it very carefully if we do not want to miss side effect on the relavance, precision and recall for non-sku searches. |
The PR have been merged in the master branch. |
Hi @afoucret, great work. I did the following: First updated Elasticsearch to 6.X
curl localhost:9200 >> "number" : "6.2.2"
Then I created a new product with the SKU
Thanks! |
You should run :
And maybe clean the genarated folder. Updating to ES 6.x is not mandatory. Only ES 2.x support have been dropped and ES 5.x is here to stay. |
I did that, error is still throwing up.
|
I don't think this is related to the partial SKU search. I will dig in to ensure the problem will not be in the release. |
For me the main issue is solved. BR, |
I'm not sure if I'm doing this fine, but I have installed version 2.5.5 of elasticsuite, and I've made a Patch with your PR, this is the patch: From 6a41e0cec8a0b61f395ba1c5b1574751804c2fd2 Mon Sep 17 00:00:00 2001
From: Facundo Capua <fcapua@summasolutions.net>
Date: Tue, 20 Mar 2018 14:00:51 -0300
Subject: [PATCH] SKU partial match fix
--- vendor/smile/elasticsuite/src/module-elasticsuite-catalog/etc/elasticsuite_indices.xml
+++ vendor/smile/elasticsuite//src/module-elasticsuite-catalog/etc/elasticsuite_indices.xml
@@ -41,7 +41,7 @@
<isSearchable>1</isSearchable>
<isUsedForSortBy>1</isUsedForSortBy>
<isUsedInSpellcheck>1</isUsedInSpellcheck>
- <defaultSearchAnalyzer>whitespace</defaultSearchAnalyzer>
+ <defaultSearchAnalyzer>reference</defaultSearchAnalyzer>
</field>
<field name="visibility" type="integer" />
<field name="children_ids" type="integer" />
--- vendor/smile/elasticsuite/src/module-elasticsuite-core/Search/Adapter/Elasticsuite/Spellchecker.php
+++ vendor/smile/elasticsuite/src/module-elasticsuite-core/Search/Adapter/Elasticsuite/Spellchecker.php
@@ -206,7 +206,7 @@ class Spellchecker implements SpellcheckerInterface
if (in_array($analyzer, $analyzers)) {
foreach ($fieldData['terms'] as $term => $termStats) {
foreach ($termStats['tokens'] as $token) {
- $positionKey = sprintf("%s_%s", $token['start_offset'], $token['end_offset']);
+ $positionKey = $token['position'];
if (!isset($termStats['doc_freq'])) {
$termStats['doc_freq'] = 0;
--- vendor/smile/elasticsuite/src/module-elasticsuite-core/Search/Request/Query/Fulltext/QueryBuilder.php
+++ vendor/smile/elasticsuite/src/module-elasticsuite-core/Search/Request/Query/Fulltext/QueryBuilder.php
@@ -107,13 +107,13 @@ class QueryBuilder
$relevanceConfig = $containerConfig->getRelevanceConfig();
$queryParams = [
- 'field' => MappingInterface::DEFAULT_SEARCH_FIELD,
+ 'fields' => array_fill_keys([MappingInterface::DEFAULT_SEARCH_FIELD, 'sku'], 1),
'queryText' => $queryText,
'cutoffFrequency' => $relevanceConfig->getCutOffFrequency(),
'minimumShouldMatch' => $relevanceConfig->getMinimumShouldMatch(),
];
- return $this->queryFactory->create(QueryInterface::TYPE_COMMON, $queryParams);
+ return $this->queryFactory->create(QueryInterface::TYPE_MULTIMATCH, $queryParams);
}
/**
--- vendor/smile/elasticsuite/src/module-elasticsuite-core/etc/elasticsuite_analysis.xml
+++ vendor/smile/elasticsuite/src/module-elasticsuite-core/etc/elasticsuite_analysis.xml
@@ -38,7 +38,22 @@
<filter name="shingle" type="shingle" language="default">
<min_shingle_size>2</min_shingle_size>
<max_shingle_size>2</max_shingle_size>
- <output_unigrams>0</output_unigrams>
+ <output_unigrams>true</output_unigrams>
+ </filter>
+ <filter name="reference_shingle" type="shingle" language="default">
+ <min_shingle_size>2</min_shingle_size>
+ <max_shingle_size>10</max_shingle_size>
+ <output_unigrams>true</output_unigrams>
+ <token_separator></token_separator>
+ </filter>
+ <filter name="reference_word_delimiter" type="word_delimiter" language="default">
+ <generate_word_parts>true</generate_word_parts>
+ <catenate_words>false</catenate_words>
+ <catenate_numbers>false</catenate_numbers>
+ <catenate_all>false</catenate_all>
+ <split_on_case_change>true</split_on_case_change>
+ <split_on_numerics>true</split_on_numerics>
+ <preserve_original>false</preserve_original>
</filter>
<filter name="ascii_folding" type="asciifolding" language="default">
<preserve_original>0</preserve_original>
@@ -146,11 +161,11 @@
<analyzers>
<analyzer name="standard" tokenizer="whitespace" language="default">
<filters>
- <filter ref="lowercase" />
<filter ref="ascii_folding" />
<filter ref="trim" />
- <filter ref="elision" />
<filter ref="word_delimiter" />
+ <filter ref="lowercase" />
+ <filter ref="elision" />
<filter ref="standard" />
</filters>
<char_filters>
@@ -159,11 +174,24 @@
</analyzer>
<analyzer name="whitespace" tokenizer="whitespace" language="default">
<filters>
+ <filter ref="ascii_folding" />
+ <filter ref="trim" />
+ <filter ref="word_delimiter" />
<filter ref="lowercase" />
+ <filter ref="elision" />
+ </filters>
+ <char_filters>
+ <char_filter ref="html_strip" />
+ </char_filters>
+ </analyzer>
+ <analyzer name="reference" tokenizer="standard" language="default">
+ <filters>
<filter ref="ascii_folding" />
<filter ref="trim" />
+ <filter ref="reference_word_delimiter" />
+ <filter ref="lowercase" />
<filter ref="elision" />
- <filter ref="word_delimiter" />
+ <filter ref="reference_shingle" />
</filters>
<char_filters>
<char_filter ref="html_strip" />
@@ -171,11 +199,11 @@
</analyzer>
<analyzer name="shingle" tokenizer="whitespace" language="default">
<filters>
- <filter ref="lowercase" />
<filter ref="ascii_folding" />
<filter ref="trim" />
- <filter ref="elision" />
<filter ref="word_delimiter" />
+ <filter ref="lowercase" />
+ <filter ref="elision" />
<filter ref="shingle" />
</filters>
<char_filters>
@@ -184,9 +212,9 @@
</analyzer>
<analyzer name="sortable" tokenizer="keyword" language="default">
<filters>
- <filter ref="lowercase" />
<filter ref="ascii_folding" />
<filter ref="trim" />
+ <filter ref="lowercase" />
</filters>
<char_filters>
<char_filter ref="html_strip" />
@@ -194,11 +222,11 @@
</analyzer>
<analyzer name="phonetic" tokenizer="whitespace" language="default">
<filters>
- <filter ref="lowercase" />
<filter ref="ascii_folding" />
<filter ref="trim" />
- <filter ref="elision" />
<filter ref="word_delimiter" />
+ <filter ref="lowercase" />
+ <filter ref="elision" />
<filter ref="phonetic" />
</filters>
<char_filters>
--
2.14.1 In my case, the SKU I'm using is 412084300004:
I'll keep debugging and let you know my findings. Thanks! |
As stated by @PieterCappelle , I'm seeing the problem is with Here is the result I'm getting for the examples stated above: "spelling.phonetic": {
"field_statistics": {
"sum_doc_freq": 257997,
"doc_count": 3819,
"sum_ttf": 537369
},
"terms": {
"12084300004": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 11
}
]
},
"412084300004": {
"doc_freq": 1,
"ttf": 1,
"term_freq": 1,
"tokens": [
{
"position": 101,
"start_offset": 12,
"end_offset": 24
}
]
}
}
},
"spelling": {
"field_statistics": {
"sum_doc_freq": 312175,
"doc_count": 3819,
"sum_ttf": 537369
},
"terms": {
"12084300004": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 11
}
]
},
"412084300004": {
"doc_freq": 1,
"ttf": 1,
"term_freq": 1,
"tokens": [
{
"position": 101,
"start_offset": 12,
"end_offset": 24
}
]
}
}
},
"spelling.shingle": {
"field_statistics": {
"sum_doc_freq": 697775,
"doc_count": 3819,
"sum_ttf": 976939
},
"terms": {
"12084300004": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 11
}
]
},
"412084300004": {
"doc_freq": 1,
"ttf": 1,
"term_freq": 1,
"tokens": [
{
"position": 101,
"start_offset": 12,
"end_offset": 24
}
]
}
}
},
"spelling.whitespace": {
"field_statistics": {
"sum_doc_freq": 319601,
"doc_count": 3819,
"sum_ttf": 537369
},
"terms": {
"12084300004": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 11
}
]
},
"412084300004": {
"doc_freq": 1,
"ttf": 1,
"term_freq": 1,
"tokens": [
{
"position": 101,
"start_offset": 12,
"end_offset": 24
}
]
}
}
} |
So I've got something like this.
Ideally would map a sku.clean with only alpha numeric non analyzed/tokenized whatever its called. Then wildcard search would work even if it starts or ends with letters. (Currently it has weird results) <?xml version="1.0"?>
<indices xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="urn:magento:module:Smile_ElasticsuiteCore:etc/elasticsuite_indices.xsd">
<index identifier="catalog_product" defaultSearchType="product">
<type name="product" idFieldName="entity_id">
<mapping>
<field name="sku.clean" type="text">
<isSearchable>1</isSearchable>
<isUsedInSpellcheck>0</isUsedInSpellcheck>
<isFilterable>0</isFilterable>
<defaultSearchAnalyzer>cleansku</defaultSearchAnalyzer>
</field>
<analyzers>
<analyzer name="cleansku" tokenizer="cleansku" language="default">
<filters>
<filter ref="lowercase" />
<filter ref="ascii_folding" />
<filter ref="trim" />
<filter ref="elision" />
</filters>
<char_filters>
<char_filter ref="html_strip" />
<char_filter ref="pattern_replace" pattern="[^a-zA-Z0-9 ]" replacement=""/>
</char_filters>
</analyzer>
</analyzers>
Don't want it to be split up into ngrams as that seems to stuff up the partial search when using letters and numbers.... The hard part is implementing it, what files and plugin/method should I extend in my own module to add this into the catalog product search? |
A lot of issues in this great repository are about searching the SKU field. I'm currently working on Magento 2.2.3 & v2.5.4 of Elasticsuite and also have some questions about it. I think the Wiki or the Docs should be expanded with more information about this issue.
In 2.5.4 searching by SKU is working but I can't find in which behaviour exactly. Currently my config in elasticsuite_indices.xml is as following (standard)
If I search by full SKU it's working correctly but when I try to search by partial SKU. It's not working. If I created some easy example: product with SKU
M13A
and a complete other name likeTEST WITH SKU
. I have rebuilded the cached and the indexes.The exact problem is the first one. It looks like searching from the beginning is working correctly, I did the test with multiple products and this is working correct. But when you are searching for partial parts of the SKU, the search function is not working.
Any ideas? Can you reproduce this?
Thanks in advance.
The text was updated successfully, but these errors were encountered: