-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
request for comments - pacman: speed up most operations when working with a package list #3907
Conversation
This comment has been minimized.
This comment has been minimized.
hmmm what is the best way to indicate that a variable isn't used if |
Simply use |
- Use a cache (or inventory) to speed up lookups of: - installed packages and groups - available packages and groups - upgradable packages - Call pacman with the list of pkgs instead of one call per package (for installations, upgrades and removals) - Use pacman [--sync|--upgrade] --print-format [...] to gather list of changes. Parsing that instead of the regular output of pacman, which is error prone and can be changed by user configuration. This can introduce a TOCTOU problem but unless something else calls pacman between the invocations, it shouldn't be a concern. - Given the above, "check mode" code is within the function that would carry out the actual operation. This should make it harder for the check code and the "real code" to diverge. - Support for specifying alternate package name formats is a bit more robust. pacman is used to extract the name of the package when the specified package is a file or a URL. The "<repo>/<pkgname>" format is also supported. For "state: latest" with a list of ~35 pkgs, this module is about 5 times faster than the original.
thx, I updated the PR and squashed the WIP history. |
update_cache_extra_args handled as a list like the others moved the module setup to its own function for easier testing update and upgrade have no defaults (None) to let required_one_of() do its job properly
Shift successful exit without name or upgrade under "update_cache". It is an error if name or upgrade isn't specified and update_cache wasn't specified either. (Caught by ansiblemodule required_one_of but still)
Please note that such rewrites are quite a lot of work to review, so don't be surprised if it takes some time before someone actually starts adding review comments here :) |
The test
The test
The test
The test
The test
The test
The test
The test
The test
The test
|
@@ -19,6 +21,7 @@ | |||
- Indrajit Raychaudhuri (@indrajitr) | |||
- Aaron Bull Schaefer (@elasticdog) <aaron@elasticdog.com> | |||
- Maxime de Roucy (@tchernomax) | |||
- Jean Raby (@jraby) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add yourself also to .github/BOTMETA.yml
's maintainers
list for the $modules/packaging/os/pacman.py
entry.
Co-authored-by: Felix Fontein <felix@fontein.de>
Backport to stable-4: 💚 backport PR created✅ Backport PR branch: Backported as #4176 🤖 @patchback |
…with a package list (#3907) * pacman: rewrite with a cache to speed up execution - Use a cache (or inventory) to speed up lookups of: - installed packages and groups - available packages and groups - upgradable packages - Call pacman with the list of pkgs instead of one call per package (for installations, upgrades and removals) - Use pacman [--sync|--upgrade] --print-format [...] to gather list of changes. Parsing that instead of the regular output of pacman, which is error prone and can be changed by user configuration. This can introduce a TOCTOU problem but unless something else calls pacman between the invocations, it shouldn't be a concern. - Given the above, "check mode" code is within the function that would carry out the actual operation. This should make it harder for the check code and the "real code" to diverge. - Support for specifying alternate package name formats is a bit more robust. pacman is used to extract the name of the package when the specified package is a file or a URL. The "<repo>/<pkgname>" format is also supported. For "state: latest" with a list of ~35 pkgs, this module is about 5 times faster than the original. * Let fail() actually work * all unhappy paths now end up calling fail() * Update copyright * Argument changes update_cache_extra_args handled as a list like the others moved the module setup to its own function for easier testing update and upgrade have no defaults (None) to let required_one_of() do its job properly * update_cache exit path Shift successful exit without name or upgrade under "update_cache". It is an error if name or upgrade isn't specified and update_cache wasn't specified either. (Caught by ansiblemodule required_one_of but still) * Add pkgs to output on success only Also align both format, only pkg name for now * Multiple fixes Move VersionTuple to top level for import from tests Add removed pkgs to the exit json when removing packages fixup list of upgraded pkgs reported on upgrades (was tuple of list for no reason) use list idiom for upgrades, like the rest drop unused expand_package_groups function skip empty lines when building inventory * pacman: add tests * python 2.x compat + pep8 * python 2.x some more * Fix failure when pacman emits warnings Add tests covering that failure case * typo * Whitespace black failed me... * Adjust documentation to fit implicit defaults * fix test failures on older pythons * remove file not intended for commit * Test exception str with e.match * Build inventory after cache update + adjust tests * Apply suggestions from code review Co-authored-by: Felix Fontein <felix@fontein.de> * Update plugins/modules/packaging/os/pacman.py Co-authored-by: Felix Fontein <felix@fontein.de> * changelog * bump copyright year and add my name to authors * Update changelogs/fragments/3907-pacman-speedup.yml Co-authored-by: Felix Fontein <felix@fontein.de> * maintainer entry Co-authored-by: Felix Fontein <felix@fontein.de> (cherry picked from commit 1580f3c)
@jraby thanks for implementing this! |
…with a package list (#3907) (#4176) * pacman: rewrite with a cache to speed up execution - Use a cache (or inventory) to speed up lookups of: - installed packages and groups - available packages and groups - upgradable packages - Call pacman with the list of pkgs instead of one call per package (for installations, upgrades and removals) - Use pacman [--sync|--upgrade] --print-format [...] to gather list of changes. Parsing that instead of the regular output of pacman, which is error prone and can be changed by user configuration. This can introduce a TOCTOU problem but unless something else calls pacman between the invocations, it shouldn't be a concern. - Given the above, "check mode" code is within the function that would carry out the actual operation. This should make it harder for the check code and the "real code" to diverge. - Support for specifying alternate package name formats is a bit more robust. pacman is used to extract the name of the package when the specified package is a file or a URL. The "<repo>/<pkgname>" format is also supported. For "state: latest" with a list of ~35 pkgs, this module is about 5 times faster than the original. * Let fail() actually work * all unhappy paths now end up calling fail() * Update copyright * Argument changes update_cache_extra_args handled as a list like the others moved the module setup to its own function for easier testing update and upgrade have no defaults (None) to let required_one_of() do its job properly * update_cache exit path Shift successful exit without name or upgrade under "update_cache". It is an error if name or upgrade isn't specified and update_cache wasn't specified either. (Caught by ansiblemodule required_one_of but still) * Add pkgs to output on success only Also align both format, only pkg name for now * Multiple fixes Move VersionTuple to top level for import from tests Add removed pkgs to the exit json when removing packages fixup list of upgraded pkgs reported on upgrades (was tuple of list for no reason) use list idiom for upgrades, like the rest drop unused expand_package_groups function skip empty lines when building inventory * pacman: add tests * python 2.x compat + pep8 * python 2.x some more * Fix failure when pacman emits warnings Add tests covering that failure case * typo * Whitespace black failed me... * Adjust documentation to fit implicit defaults * fix test failures on older pythons * remove file not intended for commit * Test exception str with e.match * Build inventory after cache update + adjust tests * Apply suggestions from code review Co-authored-by: Felix Fontein <felix@fontein.de> * Update plugins/modules/packaging/os/pacman.py Co-authored-by: Felix Fontein <felix@fontein.de> * changelog * bump copyright year and add my name to authors * Update changelogs/fragments/3907-pacman-speedup.yml Co-authored-by: Felix Fontein <felix@fontein.de> * maintainer entry Co-authored-by: Felix Fontein <felix@fontein.de> (cherry picked from commit 1580f3c) Co-authored-by: Jean Raby <jean@raby.sh>
In 1580f3c#commitcomment-67899699 someone commented that this PR uses |
wait, why does it even work with have you started doing it or do you want me to prepare a PR? |
Interestingly, |
I haven't started a PR, feel free to create one :) |
alright. I'll do that later today. |
* pacman: don't always return changed w/ update_cache This used to be the behavior before the recent refactoring. [1] Allows the following to return changed only when packages were upgraded: - pacman: update_cache: yes upgrade: yes And the following to return changed only when the foo package wasn't at the latest version: - pacman: name: foo state: latest update_cache: yes [1] #3907 * Update changelogs/fragments/4318-pacman-restore-old-changed-behavior.yml Co-authored-by: Felix Fontein <felix@fontein.de> Co-authored-by: Felix Fontein <felix@fontein.de>
* pacman: don't always return changed w/ update_cache This used to be the behavior before the recent refactoring. [1] Allows the following to return changed only when packages were upgraded: - pacman: update_cache: yes upgrade: yes And the following to return changed only when the foo package wasn't at the latest version: - pacman: name: foo state: latest update_cache: yes [1] #3907 * Update changelogs/fragments/4318-pacman-restore-old-changed-behavior.yml Co-authored-by: Felix Fontein <felix@fontein.de> Co-authored-by: Felix Fontein <felix@fontein.de> (cherry picked from commit cc8151f)
* pacman: don't always return changed w/ update_cache This used to be the behavior before the recent refactoring. [1] Allows the following to return changed only when packages were upgraded: - pacman: update_cache: yes upgrade: yes And the following to return changed only when the foo package wasn't at the latest version: - pacman: name: foo state: latest update_cache: yes [1] #3907 * Update changelogs/fragments/4318-pacman-restore-old-changed-behavior.yml Co-authored-by: Felix Fontein <felix@fontein.de> Co-authored-by: Felix Fontein <felix@fontein.de> (cherry picked from commit cc8151f) Co-authored-by: Evangelos Foutras <evangelos@foutrelis.com>
SUMMARY
This code got a bit out of hand...
I started testing ansible a few weeks ago on arch linux and wondered at why checking if a list of package was installed took so long.
I saw #3606 and somehow thought I was already running this code, so I set out to figure out why it was still slow. (I was not, and I realized this after I finished writing this module... 🤦 )
This is the result of that effort.
It is basically a rewrite of the module to use a cache (or inventory) of installed packages and groups, available packages and groups and upgradable packages in order to avoid all the individual calls to
pacman
that the module was doing.(I could have plugged a cache in the original code and probably should have, but here we are, I got carried away. I do think this version is a bit easier to follow but I just wrote it, so YMMV)
This version of the module is significantly faster when using
state: latest
but only marginally faster when usingstate: present
. (it is also a bit slower when listing a single pkg to install / upgrade since the cache/inventory is built anyway)The performance difference is much more noticeable when a long list of package is used.
I opted to do away with the
re
module and parsing of pacman's installation/upgrade output which can change depending on theVerbosePkgsList
option.Instead the module will first call
--sync/--upgrade --print-format [...]
, parse that with simplesplit
andstrip
operations, and then proceed to do the actual installation.I'm not overly fond of that approach, since it could introduce a time-to-check vs time-of-use problem, but I think this is better than trying to parse the output of pacman which can change depending on the user's settings.
Also, instead of doing each package operation with an individual call to
pacman
, there's now a single call topacman
with the full list of packages.pacman
is good at reporting errors for individual packages, and those are reported back to the user viastderr
.(or directly in
msg
if the package cannot be found)Support for specifying package name in the format
<repo>/<packagename>
is also a little more robust I think.(for example,
community/elixir
is correctly resolved toelixir
but would install usingpacman -S community/elixir
)Finally, I opted to keep the check mode and the "real mode" together instead of in distinct functions as the original module did. It felt like it would be easier to make sure the check mode was really doing what the real mode would be doing.
Anyways, I thought I might as well show the work now that it is done, even though the speedup in
state: present
is much less than when I was comparing to ansible without #3606Oh one more thing, maybe I didn't know where to look since it is my first time looking at ansible code, but I couldn't find any test for the pacman module.
This new version includes as many tests as I could find for the original... :(There are now a bunch of tests for the module.Performance
For a list of 36 pkgs, doing
state: latest
:Old module:
New module
When using
state: present
the difference is much smaller:0m1.671s
vs0m1.563s
Keep in mind however that the performance gets worse when more pkgs are in the list with the original module, whereas it stays mostly the same with this version (dict lookup vs calling out to pacman)
ISSUE TYPE
COMPONENT NAME
pacman