Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catalog Category Indexing takes very long on MariaDB 10.3 with many products #25199

Closed
amenk opened this issue Oct 21, 2019 · 32 comments · Fixed by #27129
Closed

Catalog Category Indexing takes very long on MariaDB 10.3 with many products #25199

amenk opened this issue Oct 21, 2019 · 32 comments · Fixed by #27129
Assignees
Labels
Auto-Tests: Not Required Changes in Pull Request does not require coverage by auto-tests Component: Indexer Fixed in 2.4.x The issue has been fixed in 2.4-develop branch Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed Issue: Ready for Work Gate 4. Acknowledged. Issue is added to backlog and ready for development Priority: P2 A defect with this priority could have functionality issues which are not to expectations. Progress: ready for dev Reproduced on 2.4.x The issue has been reproduced on latest 2.4-develop branch Severity: S1 Affects critical data or functionality and forces users to employ a workaround.

Comments

@amenk
Copy link
Contributor

amenk commented Oct 21, 2019

Description (*)

I know MariaDB 10.3 is not yet supported, as per dev docs, so I add this as feature requests, while it will manifest as a bug as soon as we support MariaDB 10.3

Magento should remove the quotes in the SELECT entity_id IN statement when creating indexes. The entity_ids are not strings, and this seems to confuse the optimizer.

Steps to reproduce:

  1. Change value for simple products to 50000 for example (setup/performance-toolkit/profiles/ce/small.xml);
  2. Run in console: bin/magento setup:perf:generate-fixtures /var/www/html/magento24/setup/performance-toolkit/profiles/ce/small.xml for example

Expected behavior (*)

Catalog Category Indexing should run quickly, on MariaDB 10.1 and on 10.3.18

Benefits

Shop ownes can use MariaDB 10.3.18

Additional information

See this answer on StackExchange: https://magento.stackexchange.com/a/293345/81

I took the full query which is run several times (for each store view) on the catalog category products indexer and extracted the select part

SELECT 2 AS `category_id`, `cp`.`entity_id` AS `product_id`, IF(ccp.product_id IS NOT NULL, ccp.position, 0) AS `position`, IF(ccp.product_id IS NOT NULL, 1, 0) AS `is_parent`, 2 AS `store_id`, IFNULL(cpvs.value, cpvd.value) AS `visibility` FROM `catalog_product_entity` AS `cp`
 INNER JOIN `catalog_product_website` AS `cpw` ON cpw.product_id = cp.entity_id
 INNER JOIN `catalog_product_entity_int` AS `cpsd` ON cpsd.entity_id = cp.entity_id AND cpsd.store_id = 0 AND cpsd.attribute_id = 97
 LEFT JOIN `catalog_product_entity_int` AS `cpss` ON cpss.entity_id = cp.entity_id AND cpss.attribute_id = cpsd.attribute_id AND cpss.store_id = 2
 INNER JOIN `catalog_product_entity_int` AS `cpvd` ON cpvd.entity_id = cp.entity_id AND cpvd.store_id = 0 AND cpvd.attribute_id = 99
 LEFT JOIN `catalog_product_entity_int` AS `cpvs` ON cpvs.entity_id = cp.entity_id AND cpvs.attribute_id = cpvd.attribute_id  AND cpvs.store_id = 2
 LEFT JOIN `catalog_category_product` AS `ccp` ON ccp.product_id = cp.entity_id WHERE (cpw.website_id = '1') AND (IFNULL(cpss.value, cpsd.value) = 1) AND           (IFNULL(cpvs.value, cpvd.value) IN (2, 3, 4)) AND (cp.entity_id IN ('107280', '107281', '107282', '107283', '107284'
    [... around 70 K entity IDs ...]
    187965', '187966')) GROUP BY `cp`.`entity_id`;

This query alone runs 8 minutes on Maria DB 10.3

When I simply remove all the quotes via s/'//g in vi in the line with the "IN" statement I get

SELECT 2 AS `category_id`, `cp`.`entity_id` AS `product_id`, IF(ccp.product_id IS NOT NULL, ccp.position, 0) AS `position`, IF(ccp.product_id IS NOT NULL, 1, 0) AS `is_parent`, 2 AS `store_id`, IFNULL(cpvs.value, cpvd.value) AS `visibility` FROM `catalog_product_entity` AS `cp`
 INNER JOIN `catalog_product_website` AS `cpw` ON cpw.product_id = cp.entity_id
 INNER JOIN `catalog_product_entity_int` AS `cpsd` ON cpsd.entity_id = cp.entity_id AND cpsd.store_id = 0 AND cpsd.attribute_id = 97
 LEFT JOIN `catalog_product_entity_int` AS `cpss` ON cpss.entity_id = cp.entity_id AND cpss.attribute_id = cpsd.attribute_id AND cpss.store_id = 2
 INNER JOIN `catalog_product_entity_int` AS `cpvd` ON cpvd.entity_id = cp.entity_id AND cpvd.store_id = 0 AND cpvd.attribute_id = 99
 LEFT JOIN `catalog_product_entity_int` AS `cpvs` ON cpvs.entity_id = cp.entity_id AND cpvs.attribute_id = cpvd.attribute_id  AND cpvs.store_id = 2
 LEFT JOIN `catalog_category_product` AS `ccp` ON ccp.product_id = cp.entity_id WHERE (cpw.website_id = '1') AND (IFNULL(cpss.value, cpsd.value) = 1) AND           (IFNULL(cpvs.value, cpvd.value) IN (2, 3, 4)) AND (cp.entity_id IN (107280, 107281, 107282, 107283, 107284
    [... around 70 K entity IDs ...]
    187965, 187966)) GROUP BY `cp`.`entity_id`;

Next steps

I will try to provide a patch.

@m2-assistant
Copy link

m2-assistant bot commented Oct 21, 2019

Hi @amenk. Thank you for your report.
To help us process this issue please make sure that you provided the following information:

  • Summary of the issue
  • Information on your environment
  • Steps to reproduce
  • Expected and actual results

Please make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, please, add a comment to the issue:

@magento give me 2.3-develop instance - upcoming 2.3.x release

For more details, please, review the Magento Contributor Assistant documentation.

@amenk do you confirm that you were able to reproduce the issue on vanilla Magento instance following steps to reproduce?

  • yes
  • no

@magento-engcom-team magento-engcom-team added the Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed label Oct 21, 2019
@amenk
Copy link
Contributor Author

amenk commented Oct 21, 2019

Not able to reproduce, because it's a feature requests for a recent MariaDB 10.3 which is not yet supported by Magento 2.3

@amenk
Copy link
Contributor Author

amenk commented Oct 21, 2019

I also created an issue in the MariaDB tracker: https://jira.mariadb.org/browse/MDEV-20871

@amenk
Copy link
Contributor Author

amenk commented Oct 22, 2019

vendor/magento/module-catalog/Model/Indexer/Category/Product/Action/Full.php:287

    foreach ($batchQueries as $query) {
        $this->connection->delete($this->tableMaintainer->getMainTmpTable((int)$store->getId()));
        $entityIds = $this->connection->fetchCol($query);
        foreach($entityIds as &$val) { // patch
            $val = (integer) $val;     // patch
        }                              // patch
        $resultSelect = clone $basicSelect;
        $resultSelect->where($whereCondition, $entityIds);

Or should we put it into \Magento\Framework\DB\Select::where and convert is_numeric values to numbers ?

Digging more down into the function leads to Zend Framework 1 code :(

@amenk
Copy link
Contributor Author

amenk commented Oct 22, 2019

I tried to patch
\Magento\Framework\DB\Select::where

public function where($cond, $value = null, $type = null)
{
    // ensure that integers are passed as integer,
    // to avoid negative effects on performance
    if (is_array($value)) {
        foreach($value as &$val) {
            $val = $this->castNumeric($val);
        }
    } else {
        $value = $this->castNumeric($value);
    }

    if ($value === null && $type === null) {
        $value = '';
    } elseif ($type == self::TYPE_CONDITION) {
        $type = null;
    }
    if (is_array($value)) {
        $cond = $this->getConnection()->quoteInto($cond, $value);
        $value = null;
    }
    return parent::where($cond, $value, $type);
}

private function castNumeric($val) {
    if (is_numeric($val)) {
        return $val + 0;
    }
    return $val;
}

But this does not catch the situation when you edit a categories anchor property -> the update of the url_rewrite index still uses quotes in strings

@amenk
Copy link
Contributor Author

amenk commented Oct 22, 2019

I will inject the patch into \Magento\Framework\DB\Adapter\Pdo\Mysql::quoteInto

amenk added a commit to iMi-digital/magento2 that referenced this issue Oct 22, 2019
@ghost ghost assigned amenk Oct 22, 2019
@fritzmg
Copy link

fritzmg commented Oct 24, 2019

We are noticing this problem too with MariaDB 10.3.18 in Magento 2.3.2. We notice that most INSERT queries of the indexer take waaaay longer than expected and especially the following query takes up to 8 hours!

INSERT INTO `catalog_category_product_index_store1_tmp` (`category_id`, `product_id`, `position`, ...

We cannot even do a indexer:reindex because once it arrives at the categories index, the following error will happen after 8 hours:

Product Categories indexer process unknown error:
SQLSTATE[42S02]: Base table or view not found: 1146 Table 'catalog_category_product_index_store1_tmp' doesn't exist, query was: DELETE FROM `catalog_category_product_index_store1_tmp`

So apparently it takes so long, that the temporary table "vanishes" in between (?).

@amenk
Copy link
Contributor Author

amenk commented Oct 24, 2019

@fritzmg Feel free to try my patch #25212

@amenk
Copy link
Contributor Author

amenk commented Oct 24, 2019

@fritzmg Which MariaDB version were you on before it became so slow?

@fritzmg
Copy link

fritzmg commented Oct 24, 2019

Before it was actually MySQL 5.7.27. Since this is happening in a live system (hoster had to switch DB server due to other troubles) we can't unfortunately test your fix (yet) and thus our hoster is in the process of downgrading the MariaDB version again.

@amenk
Copy link
Contributor Author

amenk commented Oct 24, 2019

What we did was, to locally setup 10.3.18 in docker to reproduce the bug and test the fix.

@fritzmg
Copy link

fritzmg commented Oct 24, 2019

Yes, I've seen that on Stackoverflow :). When we get around to it, we will try the same.

@mpingu
Copy link

mpingu commented Oct 28, 2019

@amenk can you try it with MariaDB 10.3.17 also, i got a case where it was in 10.3.18 slow and in 10.3.17 fast. My currently suspect is a regession because of https://jira.mariadb.org/browse/MDEV-20109, see the other Reports linked there.

@fritzmg
Copy link

fritzmg commented Oct 29, 2019

We are noticing similar problems with MariaDB 10.2.27 - just not with the queries of the indexer. Certain selects take up to 60 seconds while they should only last milliseconds.

@ghost
Copy link

ghost commented Oct 30, 2019

I'm also running mariadb 10.3.18 and having these issues. Indexers just never stop being triggered now even after tripling our hardware speed.

Attempting to clone a server then upgrade to 10.4 on it and test indexing on it.

@ghost
Copy link

ghost commented Oct 30, 2019

10.4 latest made zero difference.
@amenk patch reduced it from 45 mins to 40 seconds for the query he posted.
Removing IN for entity_id altogether reduces query time to 0.0013 seconds

M2 mview and indexer system are once again trash and once again costing my client money and me time.

@amenk
Copy link
Contributor Author

amenk commented Oct 30, 2019

How did you remove "IN" for entity_id altogether? Did you still select the same rows? Via a JOIN?

Please maintain a constructive tone ;-)

@ghost
Copy link

ghost commented Oct 30, 2019

I just manually changed the query for testing, I'm researching now if the entity_id's are do to a batch or handed off from a prior query. If it's just the mview batching, then it's 30,000 faster to not batch than to do it.

@ghost
Copy link

ghost commented Oct 30, 2019

It does select all products and then batch them for a root category full reindex.
Here's my result after removing batching entirely for that:
Category Products index has been rebuilt successfully in 00:00:14

Before it was 5 hours.

Will test some more then release a module to override default m2 behavior. Honestly, should probably just write a module to overwrite all indexers at some point they are terrible I have worked on this project nearly 3 years and this is a constant source of suffering. Many other ecommerce systems don't have something like this for good reason, just more examples of overengineering to make stuff worse and more costly to develop, run, and maintain. I can't keep a positive tone because I have nothing positive to say about it, and I doubt you would either if your quality of life and that of your clients was worsened by poorly designed magento2. If we could switch off it we would but sunk cost is a lot at this point and it's as I said a few years work.

@ghost
Copy link

ghost commented Nov 3, 2019

https://github.com/SoloDevAtrix/snippets/blob/master/indexer_catalog_category_products.php
runs in :02 seconds on the same database, my frontend functions perfect with it.

My original glance over that it selected all products for root was wrong they actually do quite a bit of specific join conditions on all types. I have my database matching a full run pretty closely. Only difference is a few positions are adjusted in my view, for the better, as well as don't have weird one off where it'll put in entries into the index for things like
A product with cat path 1/2/4/142/160
normally it'd put in the 160 the 2 and 4
but 142 would be skipped by m2.
but in some cases it'd put it in for no reason.
i have the option in my code to put it in for all or none. was the only diff I could find in the results of the code output into the database outside slight position stuff as I said.

Enabling putting in all the non parent non leaf options doubles the size as well as seems to achieve nothing. So left it off.

If you use it power to you, I offer zero warranty, even implied, and you're on your own to get it to work. It's also alpha code at best so if you run into a bug then you ran into a bug cuz it's just finished with it's first test run.

I just think it's really sour how badly m2 code runs this. You're filling ONE TABLE.

Here's the reasons why m2's queries are SLOW SLOW SLOW:

  1. Use of IN to do batching. This changes mysql joins from better join types to a range type which is slower.
  2. Redoing tasks because of batching. Stop batching your shit unless it's necessary and if it is then don't use IN statements.
  3. Using temporary tables, almost always bad for performance. Avoid using them ever.
  4. Tons of self joins, you had way way way too many and some of them served almost no purpose.
  5. Forcing all the work on mysql. Sometimes this is efficient. Sometimes it's better to write a loop or do work in PHP. Some things are so complex the tradeoff for EVERYONE that has to work on it is just better to chunk it up into code sections in php. This seems to use mysql or be damned and it be damned.
  6. way too much data from all those joins. simple is better. Adding one bad join can spike usage from .0001 seconds to some query that can take 3+mins to run. Use EXPLAIN and understand what it says.

Last note, my working example code, doesn't follow m2 ideology. At all. If you guys cared as much about pragmatic efficient solutions as you did your cursed ideology then maybe m2 wouldn't be a garbage heap.

This is a serious problem for my client, it's actually maybe costing him his entire business.

The indexers lock up the database, adding hardware barely helps. Locked up database means slow loads on any page not already cached in varnish. Cache warming helps but can't do every page on the site. Leads to lower conversions. He can't fix or edit products because again, it's so slow. And when he does, he gets locked out because hours of indexers are running. So if my language is harsh, that's why.

I've worked for this client oh, 9 years now. The switch to magento2 is by far the worst decision that's ever been made for the company.

@amenk
Copy link
Contributor Author

amenk commented Nov 23, 2019

set in_predicate_conversion_threshold = 4294967295;
set global in_predicate_conversion_threshold = 4294967295;

seems also to help, as a workaround.

See: https://jira.mariadb.org/browse/MDEV-20871?focusedCommentId=138523&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-138523

There might be a fix in MariaDB 10.3.21

@bve-9384
Copy link

bve-9384 commented Feb 3, 2020

Migrated to MariaDB 10.4.12 on my M2.1.7 install with 6000 products (centos 8, php7.0)
Indexing seems to be OK, plan to migrate to M2.3 within the next few weeks

magento indexer:reindex
Design Config Grid index has been rebuilt successfully in 00:00:00
Customer Grid index has been rebuilt successfully in 00:00:00
Product Flat Data index has been rebuilt successfully in 00:00:01
Category Flat Data index has been rebuilt successfully in 00:00:00
Category Products index has been rebuilt successfully in 00:00:01
Product Categories index has been rebuilt successfully in 00:00:00
Product Price index has been rebuilt successfully in 00:00:02
Product EAV index has been rebuilt successfully in 00:00:03
Stock index has been rebuilt successfully in 00:00:00
Catalog Rule Product index has been rebuilt successfully in 00:00:00
Catalog Product Rule index has been rebuilt successfully in 00:00:00
Catalog Search index has been rebuilt successfully in 00:01:01

@nobodyfamouz
Copy link

set in_predicate_conversion_threshold = 4294967295;
set global in_predicate_conversion_threshold = 4294967295;

seems also to help, as a workaround.

See: https://jira.mariadb.org/browse/MDEV-20871?focusedCommentId=138523&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-138523

There might be a fix in MariaDB 10.3.21

Thanks to this indexing of category products went from impossible (7+ hours) to 10 seconds! Thanks a lot!

@ihor-sviziev ihor-sviziev added Severity: S2 Major restrictions or short-term circumventions are required until a fix is available. Severity: S1 Affects critical data or functionality and forces users to employ a workaround. and removed Severity: S2 Major restrictions or short-term circumventions are required until a fix is available. labels May 7, 2020
@ghost ghost assigned kandy May 7, 2020
@ihor-sviziev ihor-sviziev added the Auto-Tests: Not Required Changes in Pull Request does not require coverage by auto-tests label May 8, 2020
@engcom-Alfa engcom-Alfa added the Triage: Dev.Experience Issue related to Developer Experience and needs help with Triage to Confirm or Reject it label Aug 4, 2020
@ghost ghost added the Priority: P2 A defect with this priority could have functionality issues which are not to expectations. label Aug 17, 2020
@engcom-Alfa engcom-Alfa added Component: Indexer Reproduced on 2.4.x The issue has been reproduced on latest 2.4-develop branch Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed labels Sep 2, 2020
@ghost ghost removed the Triage: Dev.Experience Issue related to Developer Experience and needs help with Triage to Confirm or Reject it label Sep 2, 2020
@magento-engcom-team magento-engcom-team added the Issue: Ready for Work Gate 4. Acknowledged. Issue is added to backlog and ready for development label Sep 2, 2020
@magento-engcom-team
Copy link
Contributor

✅ Confirmed by @engcom-Alfa
Thank you for verifying the issue. Based on the provided information internal tickets MC-37309 were created

Issue Available: @engcom-Alfa, You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself.

@harrigo
Copy link

harrigo commented Jul 19, 2021

I seem to be getting a similar problem, show processlist shows lots of things randomly sending data for what seems like forever. Originaly seemed to start happening with related products but my query for that was slightly modified and thought it was my fault however changed it and it went away.

Now however am noticing queries hanging indefinitely when running schema recurring for Magento_Indexer section of setup upgrade...

I'm on 2.3.7 and commits here are for 2.4.2 will there be a similar fix available for 2.3.x as hoping it is a similar issue?

@xantek-imc
Copy link

I had this issue with 2.3.7-p3 and MariaDB 10.4.24
Using MariaDB 10.3.34 solved the issue.

@realadityayadav
Copy link
Member

realadityayadav commented Nov 18, 2022

For anybody on Maria db 10.4 and experiencing long reindexing times please do
https://experienceleague.adobe.com/docs/commerce-operations/performance-best-practices/configuration.html

optimizer_switch='rowid_filter=off'
optimizer_use_condition_selectivity = 1

@vasilii-b
Copy link

What @realadityayadav told helped a bit in the situation I've been.

Besides that, a huge impact had the changes for in_predicate_conversion_threshold as suggested by @nobodyfamouz .

The combo I had to deal with:
MariaDb 10.4.27
Magento 2.3.7-p3

@Reno-Axel
Copy link

Reno-Axel commented May 31, 2023

Hi, I was struggling with the same problem, catalog_category_product index take too long, I resolve it by reducing batchRowsCount:

<type name="Magento\Catalog\Model\Indexer\Category\Product\Action\Full"> <arguments> <argument name="batchRowsCount" xsi:type="number">500</argument> </arguments> </type>

MariaDB: 10.4.29
Magento: 2.4.2-p2

@RakeshJesadiya
Copy link
Contributor

I have improved the Magento category indexer performance by adding a given parameter to the Warden configuration yaml file by given link.
https://www.rakeshjesadiya.com/improve-category-indexer-performance-local-setup-by-warden-magento/

@erikhansen
Copy link
Contributor

@RakeshJesadiya Thanks for the tip. My DB instance gave me this error:

mysqld: Error while setting value '"rowid_filter=off"' to 'optimizer_switch'

So instead in the.warden/warden-env.yml file I added this:

services:
  db:
    volumes:
      - ./.warden/mysql/mysql_configuration.cnf:/etc/mysql/mariadb.conf.d/99-mysql_configuration.cnf

And then in .warden/mysql/mysql_configuration.cnf:

[mysqld]
max_allowed_packet=1024M
explicit_defaults_for_timestamp=on
optimizer_search_depth=4
optimizer_use_condition_selectivity=1
optimizer_switch='rowid_filter=off'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Auto-Tests: Not Required Changes in Pull Request does not require coverage by auto-tests Component: Indexer Fixed in 2.4.x The issue has been fixed in 2.4-develop branch Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed Issue: Ready for Work Gate 4. Acknowledged. Issue is added to backlog and ready for development Priority: P2 A defect with this priority could have functionality issues which are not to expectations. Progress: ready for dev Reproduced on 2.4.x The issue has been reproduced on latest 2.4-develop branch Severity: S1 Affects critical data or functionality and forces users to employ a workaround.
Projects
None yet