[BundlesCMS][SearchBundle][NodeSearchBundle] Remove all deprecations in our ElasticaSearch related code. #1185
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Remove _analyzer mapping
Why is it changed?
At Kunstmaan we are looking to also provide support for Elasticsearch 2.* in the near future. This is one of the preliminary cleanups before we can look at that upgrade.
Using an _analyzer field per document had already been flagged as deprecated since 1.5! See elastic/elasticsearch#9279 for an in depth reasoning as to why they did not want to support it any more.
This meant for us that our current way of using Elasticsearch to search in our pages would not work any more if we decided to upgrade to ElasticSearch 2.*.
A secondary reason for rewriting this part of the code is to improve scaling for websites with a very large amount of pages. For example a high traffic blog with 1000 pages each translated in to 4 different languages. Our old query would have to parse all 4000 generated documents and filter afterwards each time you searched for something. This has been improved as well. The new query will only parse the 1000 relevant translations.
What has changed?
How did our code work before?
When invoking the setup and populate command 1 index would be created, in this index all pages of the website would be indexed. This index had many different analyzers for each language. On all these indexed documents a property lang and analayzer was set. When querying the index all documents would be parsed using the analyzer defined in the field per document. Afterwards the results would be filtered based on the lang field on this document.
How does it work now?
When invoking the setup and populate command a different index will be created for each configured locale. Documents will only be added to the relevant index. Each index has 1 index analyzer and 1 suggestion analyzer specific for the language in the index. No extra properties need to be saved on the documents. When querying, the query will only be run on the relevant index using the index specific suggestion_analyzer. Documents we do not care about in that specific query because of the language will not be parsed anymore and thus no extra filtering afterwards is necessary.
Rewrite queries to use aggregations instead of facets.
What has changed
Facets have been disabled in ElasticSearch 2.*, they have been superseded by aggregations. Aggregations are very simply explained a more powerful implementation for facets.
We already used aggregations in some parts of the code but still kept using facets in other parts to avoid BC breaks. These facets have now been completely removed and replaced by aggregations. This means however that upon upgrading your KunstmaanBundles project you will have to fix small parts of the twig files that are created when using
app/console kuma:generate:search
.Remove _boost mapping
Why is it changed?
Together with removing the option for assigning a specific analyzer to a specific document, the ElasticSearch team decided to remove the option to assign a specific _boost on a document level. A more thorough explanation on why document boosting has been removed can be found here. http://blog.brusic.com/2014/02/document-boosting-in-elasticsearch.html
It could make the queries and scoring very difficult to understand because some field on the result documents that you can't really see could change the scoring of your query. This could result in people writing their own NodeSearcher with custom queries and not being able to fine tune the scoring.
What has changed?
How did our code work before?
When populating the indices we would map a _boost field on each document. This _boost field would be incremented, if the page for that document implemented the BoostSearchInterface or if a NodeSearch existed for that node in the database. When querying the indices these _boost fields would magically affect the end score.
How does it work now?
The _boost field has been removed. Because of this we can not do anything at population time anymore. Instead for each query that is now done the top 500 results will be "rescored". This means a second query will be run on top of those results that will affect the end score. This second query will apply extra boosts based on the fact if your page implements the BoostSearchInterface or has a NodeSearch in the database. This is more verbose, easier to understand and configurable than our old implementation. Because of how rescoring works the extra load this second query brings should be minimal.
https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-rescore.html
Replace index_name in view_roles mapping
What has changed?
index_name was a really old and obscure option that didn't really do anything anymore. Removing the mapping did not break any of the functionality and makes our mappings more ElasticSearch 2.* compatible. The option has also been removed from our configuration as you really should not be using it anymore. It has been replaced by it's actually useful predecessor "copy_to".
For more information about the index_name mapping check this page from the ElasticSearch wiki:
https://www.elastic.co/guide/en/elasticsearch/reference/1.3/mapping-object-type.html#_include_in_all_2
And it's predecessor:
https://www.elastic.co/guide/en/elasticsearch/guide/current/custom-all.html
BC breaks!
Website with a default SearchPage without changes to the underlying code.
These websites will have to do 2 things to become fully operational again.
First they have to set up their indices again. This can be achieved with the following commands.
app/console kuma:search:setup && app/console kuma:search:populate full
And the twig files that were generated by the
app/console kuma:generate:search
have to be fixed. All references to facets in these files have to be rewritten to aggregations. In this commit you can see exactly how to do that. 5e4a7c1Changed interfaces and implementations
If you have overwritten, implemented, extended any of these interfaces and classes you should check if you need to do some rewriting of your custom code.
_src/Kunstmaan/SearchBundle/Search/AnalysisFactoryInterface_
_src/Kunstmaan/SearchBundle/Search/AnalysisFactory_
_src/Kunstmaan/NodeSearchBundle/Search/SearcherInterface_
_src/Kunstmaan/NodeSearchBundle/Search/AbstractElasticaSearcher_
_src/Kunstmaan/NodeSearchBundle/Configuration/NodePagesConfiguration_
_src/Kunstmaan/NodeSearchBundle/Services/SearchService_
_src/Kunstmaan/NodeSearchBundle/pagerFanta/Adapter/SearcherRequestAdapterInterface_
_src/Kunstmaan/NodeSearchBundle/PagerFanta/Adapter/SearcherRequestAdapter_
_src/Kunstmaan/NodeSearchBundle/Search/NodeSearcher_
_Configuration_