Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-words terms with accents not handled by thesaurus #1514

Closed
rbayet opened this issue Aug 27, 2019 · 0 comments
Closed

Multi-words terms with accents not handled by thesaurus #1514

rbayet opened this issue Aug 27, 2019 · 0 comments
Assignees
Labels

Comments

@rbayet
Copy link
Collaborator

rbayet commented Aug 27, 2019

If a multi-word synonym or expansion rule contains non ASCII US characters, indexing the thesaurus fails with the error

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"failed to build synonyms"}],"type":"illegal_argument_exception","reason":"failed to build synonyms","caused_by":{"type":"parse_exception","reason":"Invalid synonym rule at line 12","caused_by":{"type":"illegal_argument_exception","reason":"term: [multi word synonym] analyzed to a token (multi_word) with position increment != 1 (got: 0)"}}},"status":400}

Exemple:

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"failed to build synonyms"}],"type":"illegal_argument_exception","reason":"failed to build synonyms","caused_by":{"type":"parse_exception","reason":"Invalid synonym rule at line 12","caused_by":{"type":"illegal_argument_exception","reason":"term: pâte à modeler analyzed to a token (pâte_à) with position increment != 1 (got: 0)"}}},"status":400}

Possible linked issue : elastic/elasticsearch#27481
Possible way to avoid the error (but with the loss of the invalid synonym rule) : elastic/elasticsearch@88c270d

Preconditions

Magento Version : 2.2.9, EE 2.3.x

ElasticSuite Version : 2.6.9, 2.8.x-dev

Environment : ElasticSearch 6.6 (and 6.8.2)

Third party modules : N/A

Steps to reproduce

  1. Create a new Thesaurus, active on all stores
  2. Add a reference term "fimo"
  3. Add the expansion terms "pâte à modeler"
  4. Reindex manually the Thesaurus (CLI) or reindex all

Expected result

  1. The Thesaurus index should be rebuilt

Actual result

  1. Thesaurus reindexing fails with the error mentionned above

How to fix

In \Smile\ElasticsuiteThesaurus\Model\Indexer\IndexHandler::prepareSynonymFilterData, which alters multi-word synonyms by replacing spaces by a _, add the u flag in order to support all UTF-8 characters and not only ASCII US :

    /**
     * Prepare the thesaurus data to be saved.
     * Spaces are replaced with "_" into multiwords expression (ex foo bar => foo_bar).
     *
     * @param string[] $rows Original thesaurus text rows.
     *
     * @return string[]
     */
    private function prepareSynonymFilterData($rows)
    {
        $rowMaper = function ($row) {
            return preg_replace('/([\w])[\s-](?=[\w])/u', '\1_', $row);
        };

        return array_map($rowMaper, $rows);
    }
@rbayet rbayet self-assigned this Aug 27, 2019
@rbayet rbayet added the bug label Aug 27, 2019
@rbayet rbayet changed the title Words with accents not handled by thesaurus Multi-words terms with accents not handled by thesaurus Aug 27, 2019
romainruaud added a commit that referenced this issue Aug 27, 2019
…ccents_support

Fixes #1514 Accents support in multi-words synonyms/expansions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant