Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly compute module load order #10649

Closed
wants to merge 5 commits into from
Closed

Properly compute module load order #10649

wants to merge 5 commits into from

Conversation

t-richards
Copy link
Contributor

@t-richards t-richards commented Aug 24, 2017

Description

This PR replaces the ... interesting ... choice of bubble sort for determining the order in which modules should be loaded.

The sortBySequence method now uses topological sorting to correctly linearize the sequence of modules. The topological sort algorithm implemented in the Magento\Framework\Data\Graph data structure is Kahn's algorithm.

Fixed Issues (if relevant)

  1. Module names sorting issue? #10112: Module names sorting issue?

Manual testing scenarios

  1. Create a module which contains frontend layout XML directives, and has a sequence dependency on Magento_Customer
  2. Before, with bubble sort, the incorrect load order would cause those frontend layout directives to be processed before Magento_Catalog, causing all sorts of trouble.
  3. Now, with topological sort, the load order is correct and no such trouble occurs.

Contribution checklist

  • Pull request has a meaningful description of its purpose
  • All commits are accompanied by meaningful commit messages
  • All new or changed code is covered with unit/integration tests (if applicable)
  • All automated tests passed successfully (all builds on Travis CI are green)

All new or changed code is covered with unit/integration tests

The code is covered by existing unit tests. More tests can be added if desired.

Use toposort for properly determining the order in which modules should
be loaded
@orlangur
Copy link
Contributor

Did you compare an actual execution time of old vs new implementation?

Unfortunately proposed implementation is still O(n²) just like bubble sorting and contains a lot of unnecessary array manipulations.

Proper implementation may be based on https://e-maxx-eng.appspot.com/graph/topological-sort.html or simply involve some existing library like https://github.com/marcj/topsort.php.

@t-richards
Copy link
Contributor Author

t-richards commented Aug 25, 2017

@orlangur Thank you for your feedback! Yes, I measured some significant differences between the original bubble sort implementation and my topological sort implementation.

Before getting into the micro-benchmark results, I'd like to briefly mention that the goal of this sorting implementation is to order the modules more accurately, and not necessarily improve the speed in which the dependency graph is sorted.

That said, outperforming the time complexity of bubble sort is trivial; just about any other algorithm can do better. Fortunately for us, Kahn's algorithm is not, in fact, an O(n^2) algorithm. Just like its DFS counterpart, Kahn's algorithm is linear time: O(|V| + |E|), where V is the number of graph vertices and E is the number of graph edges. Despite the appearance of the implementation (i.e. the nested while/for loops), the performance improvement and technical correctness is plain as day.

I measured the unit tests (where there are some very small graphs, 3 and 5 nodes each), and then I measured a real production Magento 2 store (containing 119 nodes). The results below are the average of three runs for each sorting algorithm.

Bubble Sort

Graph size, nodes # inner loop iterations time, s % decrease in time taken
3 5 0.000008 0%
5 14 0.000020 0%
119 7259 0.011596 0%

Toposort

Graph size, nodes # inner loop iterations time, s % decrease in time taken
3 2 0.000006 25%
5 4 0.000011 45%
119 188 0.000625 94.61%

@orlangur
Copy link
Contributor

I know the complexity of Kahn's algorithm, the problem is your implementation is O(n²) which needs to be fixed :)

Thanks for the benchmark, I do agree that the main point here is to get rid of buggy ordering, just wanted to be sure new implementation is not slower. Could you please provide a gist with patch for such measurement?

with bubble sort, the incorrect load order would cause those frontend layout directives to be processed before Magento_Catalog, causing all sorts of trouble

Automated test case needs to be added which reveals such bug in old implementation. Ideally if it will be the unit test.

What do you think about https://github.com/marcj/topsort.php? Not necessary to fully include the library, for example, we can borrow just ArraySort implementation.

@t-richards
Copy link
Contributor Author

the problem is your implementation is O(n²) which needs to be fixed :)

I'm not sure I agree with this, what part of the implementation is n^2? The inner loop only traverses the current node's edges to visit its parents, and does not re-visit every single node in the process.

Automated test case needs to be added which reveals such bug in old implementation. Ideally if it will be the unit test.

I do agree with this; I'm working on trimming down a large production case that can demonstrate improper sorting, and I will add it as a unit test.

What do you think about https://github.com/marcj/topsort.php?

I strayed away from adding this dependency, for two reasons:

  1. In previous discussions involving topological sort, the Magento team requested the addition of the algorithm to the graph data structure: Implemented toplogical sorting for totals #65
  2. If we were to use the library as-is, it would almost certainly throw ElementNotFoundExceptions everywhere, because of the prevalence of module authors declaring sequence dependencies on modules which are not actually installed.

@orlangur
Copy link
Contributor

At least array_shift makes it O(n²), I didn't check for other similar issues as just one is enough (see http://php.net/manual/en/function.array-shift.php#112947).

I strayed away from adding this dependency, for two reasons

Thanks, especially for the old links, I remember reading it before and was surprised there is no topological sorting in Magento Framework yet (I thought it was merged). Will look deeper into it tomorrow. Could you provide a benchmarking gist? Wanna do a quick check of that library speed compared to current PR.

@t-richards
Copy link
Contributor Author

t-richards commented Aug 26, 2017

At least array_shift makes it O(n²)

Thanks for the information about array_shift. At that particular point in the algorithm, the order of node removal is not important so this could be replaced with an array_pop which is O(1).

Could you provide a benchmarking gist? Wanna do a quick check of that library speed compared to current PR.

I will provide a link to a gist with some benchmark patches.

@korostii
Copy link
Contributor

Fixed Issues (if relevant)

I believe you can also add the following issues to the list:
#10331 [2.1.7] Enabling module brakes sorting of modules in config.php
#8479 Sequence of module load order should be deterministic

@orlangur
Copy link
Contributor

@korostii how using topological sort instead of bubble sort will make order deterministic?

@korostii
Copy link
Contributor

korostii commented Aug 31, 2017

@orlangur,
Well, I cannot guarantee that it does, but if really does solve #10112 it has to retain the original order at least to to some degree, when inserting\removing new entries (as opposed to shuffling it seemingly randomly - that's what usually happened before).
I haven't tested it myself though, does it not?

I am sorry if I lead to some confusion on this part. I might've misunderstood the point of changes done here when I skimmed through the PR for the first time.

Here's what I was basing that assumption on: if you take a look at that issue this PR claims to solve, which is #10112, the issue described seems to be the following:
Magento_Checkout and Magento_Tax have an implicit dependency which is not reflected into <sequence> tags of their module.xml files. Adding new modules and dependencies to the system results into shuffling config.php entries, causing Magento_Checkout and Magento_Tax to switch places. In turn, that seems to be breaking some functionality having implicit dependency on their mutual load order.

Now that I've given it some thought: why not make that broken dependency explicit instead (via adding Magento_Checkout into the Magento_Tax's <sequence>)? That should be enough to solve both #10112 and the manual testing scenario given here (whatever sorting algorithm is used, as long as it enforces these dependencies in the final ordered list)

I now suspect that solving #10112 is just a side-effect of changing the algorithm and it might not work consistently (given different conflicting modules). Could maybe @t-richards shine some explanation into how this all is tied together?

@orlangur
Copy link
Contributor

@korostii, see https://en.wikipedia.org/wiki/Sorting_algorithm#Stability. As you can see bubble sort is stable, I believe the order could be random just because modules are randomly read from filesystem.

So, by design we will not get any stability from topological sort usage instead of bubble sort. With something like https://stackoverflow.com/questions/11230881/stable-topological-sort/11236027#11236027 we can get it (if we make the input deterministic, of course), but it will add a logV multiplier to a computational complexity.

why not make that broken dependency explicit instead (via adding Magento_Checkout into the Magento_Tax's <sequence>)?

Modules in sequence must be specified according to business logic only.

I'm not even sure 8479 is even valid, will share my thoughts there.

@korostii
Copy link
Contributor

korostii commented Sep 1, 2017

No, modules in sequence should be specified according to business logic only. In theory.
In practice, Magento_Tax module depends on code from Magento_Checkout module and will crash without it.

A couple examples:
https://github.com/magento/magento2/blob/develop/app/code/Magento/Tax/etc/frontend/di.xml#L17
https://github.com/magento/magento2/blob/develop/app/code/Magento/Tax/view/frontend/web/js/view/checkout/cart/totals/tax.js#L12

So, the Tax module should either learn to live without these or declare these dependencies outright.
Since the former solution will require some refactoring and might turn out to be impossible, the latter is a faster fix, obiously.

@korostii
Copy link
Contributor

korostii commented Sep 1, 2017

Just noticed: Magento_Tax/Magento_Checkout dependency is already added on 2.2.0 branch already;
https://github.com/magento/magento2/blob/2.2.0-preview/app/code/Magento/Tax/etc/module.xml#L12
But not yet backported to 2.1:
https://github.com/magento/magento2/blob/2.1.10-preview/app/code/Magento/Tax/etc/module.xml

I am now sure this PR cannot fix #10112 on branch develop since that issues is already fixed on branch develop.

Additionally, it is my understanding that module sorting is only performed during some CLI commands (module:enable, module:disable, setup:upgrade). As such, performance shouldn't be much of an issue here. I think I would prefer current battle-tested code over new one if the benefits are unclear.

@t-richards, could you please elaborate on\explain the benefits of this change? The PR description seems to be lacking on that part right now.

@orlangur
Copy link
Contributor

orlangur commented Sep 1, 2017

@korostii,

In practice, Magento_Tax module depends on code from Magento_Checkout module and will crash without it.

What I meant is that we should not add unreasonable nodes to <sequence> just for the sake of unrelated issues solving. Like Magento_Eav references Magento_Catalog in fact while it should not and specifying it in <sequence> would produce a cycle.

Too bad eca1579 happened in scope of some random bugfixing and not solved for the overall system yet. As you can see, Magento_Tax extends Magento_Checkout with some additional logic from business perspective, on the other hand only a couple references to Magento_Tax left in Magento_Checkout templates. In opposite situation fix would be incorrect.

the benefits of this change?

Let me quote the PR author:

I'm working on trimming down a large production case that can demonstrate improper sorting, and I will add it as a unit test.

So, looks like in some cases sorting did not correspond to <sequence> configuration.

@t-richards
Copy link
Contributor Author

@orlangur Thanks for pointing out the stable topological sort, I will do some more research on this and see how it compares.

Despite many attempts, I was not able to produce any test cases which demonstrated an incorrect sort of the modules using the original sorting method. This makes me wonder whether there are module dependency issue(s) of a different kind. For now, I suppose I'll have to live with bubble sort.

If I happen to come up with a better set of test cases, I'll make a new PR.

@t-richards t-richards closed this Sep 6, 2017
@t-richards t-richards deleted the bugfix/module-load-toposort branch September 6, 2017 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants