[BundlesCMS][SearchBundle][NodeSearchBundle] Remove all deprecations in our ElasticaSearch related code. #1185

Numkil · 2016-06-03T08:56:53Z

Q	A
Bug fix?	no
New feature?	yes
BC breaks?	yes
Deprecations?	no
Fixed tickets	#1117

Remove _analyzer mapping
Create documentation/explanation about removing _analyzer mapping
Rewrite queries to use aggregations instead of facets
Create documentation/explanation about removing facets in favor of aggregations
Remove _boost mapping
Create documentation/explanation about removing _boost mapping
Remove index_name in view_roles mapping
Create documentation/explanation about removing index_name mapping
Extend upgrade.markdown file

Remove _analyzer mapping

Why is it changed?

At Kunstmaan we are looking to also provide support for Elasticsearch 2.* in the near future. This is one of the preliminary cleanups before we can look at that upgrade.

Using an _analyzer field per document had already been flagged as deprecated since 1.5! See elastic/elasticsearch#9279 for an in depth reasoning as to why they did not want to support it any more.
This meant for us that our current way of using Elasticsearch to search in our pages would not work any more if we decided to upgrade to ElasticSearch 2.*.
A secondary reason for rewriting this part of the code is to improve scaling for websites with a very large amount of pages. For example a high traffic blog with 1000 pages each translated in to 4 different languages. Our old query would have to parse all 4000 generated documents and filter afterwards each time you searched for something. This has been improved as well. The new query will only parse the 1000 relevant translations.

What has changed?

How did our code work before?

When invoking the setup and populate command 1 index would be created, in this index all pages of the website would be indexed. This index had many different analyzers for each language. On all these indexed documents a property lang and analayzer was set. When querying the index all documents would be parsed using the analyzer defined in the field per document. Afterwards the results would be filtered based on the lang field on this document.

How does it work now?

When invoking the setup and populate command a different index will be created for each configured locale. Documents will only be added to the relevant index. Each index has 1 index analyzer and 1 suggestion analyzer specific for the language in the index. No extra properties need to be saved on the documents. When querying, the query will only be run on the relevant index using the index specific suggestion_analyzer. Documents we do not care about in that specific query because of the language will not be parsed anymore and thus no extra filtering afterwards is necessary.

Rewrite queries to use aggregations instead of facets.

What has changed

Facets have been disabled in ElasticSearch 2.*, they have been superseded by aggregations. Aggregations are very simply explained a more powerful implementation for facets.

We already used aggregations in some parts of the code but still kept using facets in other parts to avoid BC breaks. These facets have now been completely removed and replaced by aggregations. This means however that upon upgrading your KunstmaanBundles project you will have to fix small parts of the twig files that are created when using app/console kuma:generate:search.

Remove _boost mapping

Why is it changed?

Together with removing the option for assigning a specific analyzer to a specific document, the ElasticSearch team decided to remove the option to assign a specific _boost on a document level. A more thorough explanation on why document boosting has been removed can be found here. http://blog.brusic.com/2014/02/document-boosting-in-elasticsearch.html

It could make the queries and scoring very difficult to understand because some field on the result documents that you can't really see could change the scoring of your query. This could result in people writing their own NodeSearcher with custom queries and not being able to fine tune the scoring.

What has changed?

How did our code work before?

When populating the indices we would map a _boost field on each document. This _boost field would be incremented, if the page for that document implemented the BoostSearchInterface or if a NodeSearch existed for that node in the database. When querying the indices these _boost fields would magically affect the end score.

How does it work now?

The _boost field has been removed. Because of this we can not do anything at population time anymore. Instead for each query that is now done the top 500 results will be "rescored". This means a second query will be run on top of those results that will affect the end score. This second query will apply extra boosts based on the fact if your page implements the BoostSearchInterface or has a NodeSearch in the database. This is more verbose, easier to understand and configurable than our old implementation. Because of how rescoring works the extra load this second query brings should be minimal.
https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-rescore.html

Replace index_name in view_roles mapping

What has changed?

index_name was a really old and obscure option that didn't really do anything anymore. Removing the mapping did not break any of the functionality and makes our mappings more ElasticSearch 2.* compatible. The option has also been removed from our configuration as you really should not be using it anymore. It has been replaced by it's actually useful predecessor "copy_to".

For more information about the index_name mapping check this page from the ElasticSearch wiki:
https://www.elastic.co/guide/en/elasticsearch/reference/1.3/mapping-object-type.html#_include_in_all_2
And it's predecessor:
https://www.elastic.co/guide/en/elasticsearch/guide/current/custom-all.html

BC breaks!

Website with a default SearchPage without changes to the underlying code.

These websites will have to do 2 things to become fully operational again.
First they have to set up their indices again. This can be achieved with the following commands.

 app/console kuma:search:setup && app/console kuma:search:populate full

And the twig files that were generated by the app/console kuma:generate:search have to be fixed. All references to facets in these files have to be rewritten to aggregations. In this commit you can see exactly how to do that. 5e4a7c1

Changed interfaces and implementations

If you have overwritten, implemented, extended any of these interfaces and classes you should check if you need to do some rewriting of your custom code.

_src/Kunstmaan/SearchBundle/Search/AnalysisFactoryInterface_

Method setupLanguages has been removed since you want create 1 analysis per index/language.
Getter and Setter for the variable stopwords have been removed. They are not used in the default implementation and it does not make sense to have getters and setters in an interface. If you want to create your own AnalysisFactory implementation which uses a custom list of stopwords you can inject that list however you want.
Method addStemmerFilter($language) has been added. Usually a better alternative for NGrams.

_src/Kunstmaan/SearchBundle/Search/AnalysisFactory_

Names of the analyzers have been changed so that the index can automatically recognize what to use them for. "index_analyzer" is changed to "default" and suggestion_analyzer has been changed to "default_search".

_src/Kunstmaan/NodeSearchBundle/Search/SearcherInterface_

Parameter $lang has been removed from the method defineSearch. It does not make sense to provide this parameter anymore since the index in the query is already language specific, and thus no need to filter on this in the query.

_src/Kunstmaan/NodeSearchBundle/Search/AbstractElasticaSearcher_

Parameter $lang has been removed from the method defineSearch. It does not make sense to provide this parameter anymore since the index in the query is already language specific, and thus no need to filter on this in the query.
Search function has been altered to search in the language specific index.

_src/Kunstmaan/NodeSearchBundle/Configuration/NodePagesConfiguration_

Default behaviour for createIndex() creates a specific index for each configured locale
Default behaviour for populateIndex() has been altered to reflect the new index structure
Mapping _boost has been removed
Mapping index_analyzer and suggestion_analyzer are gone because the index now has to detect the function of the analyzers based on their name.

_src/Kunstmaan/NodeSearchBundle/Services/SearchService_

The applySearchParams function does not apply facets to the query anymore.

_src/Kunstmaan/NodeSearchBundle/pagerFanta/Adapter/SearcherRequestAdapterInterface_

Method getFacets() has been removed.

_src/Kunstmaan/NodeSearchBundle/PagerFanta/Adapter/SearcherRequestAdapter_

All references to Facets have been removed.

_src/Kunstmaan/NodeSearchBundle/Search/NodeSearcher_

Applying extra boost to the title field should not be hardcoded in the query because it is easily configured in the field mapping and hardcoding this would always override user configuration.
Extra rescoring step added to the queries as an alternative to document boosting.

_Configuration_

Document field 'lang' has been removed from the default mapping configuration.
Document field 'contentanalyzer' has been removed from the default mapping configuration.
Parameter 'index_name' is not available anymore. Has been with the option 'copy_to'

…from elasticsearch code Rewrite Setup and Population of elasticasearch indexes. Instead of 1 shared index with all our pages together we use an index per language. Remove decrepated _analyzer mapping and specify index and suggestion analyzer per index. Rewrite queries -> filtering on language no longer necessary -> better for scaling because less documents to parse Cleanup code interfaces and AnalysisFactory to reflect changes made to indexes

…by aggregations Replace all references to facets by aggregations Remove all facets code

…rameter Add method to nodeRepository to retrieve a list off all the registered refEntityNames Remove all references to _boost in mapping and populate code add rescoring to our query to boost results in the correct way

Remove deprecated and useless index_name mapping Allow copy_to mapping in configuration because it is useful replacement for index_name

…to 3.2

Numkil changed the title ~~[WIP] Feature/remove decrepated analyzer mapping from elasticsearch code~~ Feature/remove decrepated analyzer mapping from elasticsearch code Jun 3, 2016

Numkil changed the title ~~Feature/remove decrepated analyzer mapping from elasticsearch code~~ [WIP] remove all decrepations in our ElasticaSearch related code. Jun 3, 2016

Numkil changed the title ~~[WIP] remove all decrepations in our ElasticaSearch related code.~~ [WIP] remove all deprecations in our ElasticaSearch related code. Jun 6, 2016

Numkil changed the title ~~[WIP] remove all deprecations in our ElasticaSearch related code.~~ [BundlesCMS][SearchBundle][NodeSearchBundle] Remove all deprecations in our ElasticaSearch related code. Jun 7, 2016

Kevin Jossart added 6 commits June 20, 2016 16:23

[NodeSearchBundle][GeneratorBundle] Replace all references to facets …

df8a042

…by aggregations Replace all references to facets by aggregations Remove all facets code

[NodeSearchBundle] Remove deprecated and useless index_name mapping

809f542

Remove deprecated and useless index_name mapping Allow copy_to mapping in configuration because it is useful replacement for index_name

[BundlesCms][NodeSearchBundle][SearchBundle] Upgrade ruflin/elastica …

9ea616b

…to 3.2

[BundlesCMS] Include UPGRADE.md file

9103c13

jockri added this to the 3.6.0 milestone Jun 20, 2016

jockri merged commit 6d000b9 into Kunstmaan:master Jun 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BundlesCMS][SearchBundle][NodeSearchBundle] Remove all deprecations in our ElasticaSearch related code. #1185

[BundlesCMS][SearchBundle][NodeSearchBundle] Remove all deprecations in our ElasticaSearch related code. #1185

Numkil commented Jun 3, 2016 •

edited by jockri

Loading

[BundlesCMS][SearchBundle][NodeSearchBundle] Remove all deprecations in our ElasticaSearch related code. #1185

[BundlesCMS][SearchBundle][NodeSearchBundle] Remove all deprecations in our ElasticaSearch related code. #1185

Conversation

Numkil commented Jun 3, 2016 • edited by jockri Loading

Remove _analyzer mapping

Why is it changed?

What has changed?

How did our code work before?

How does it work now?

Rewrite queries to use aggregations instead of facets.

What has changed

Remove _boost mapping

Why is it changed?

What has changed?

How did our code work before?

How does it work now?

Replace index_name in view_roles mapping

What has changed?

BC breaks!

Website with a default SearchPage without changes to the underlying code.

Changed interfaces and implementations

Numkil commented Jun 3, 2016 •

edited by jockri

Loading