Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

request for comments - pacman: speed up most operations when working with a package list #3907

Merged
merged 25 commits into from
Feb 8, 2022

Conversation

jraby
Copy link
Contributor

@jraby jraby commented Dec 14, 2021

SUMMARY

This code got a bit out of hand...
I started testing ansible a few weeks ago on arch linux and wondered at why checking if a list of package was installed took so long.

I saw #3606 and somehow thought I was already running this code, so I set out to figure out why it was still slow. (I was not, and I realized this after I finished writing this module... 🤦 )

This is the result of that effort.
It is basically a rewrite of the module to use a cache (or inventory) of installed packages and groups, available packages and groups and upgradable packages in order to avoid all the individual calls to pacman that the module was doing.
(I could have plugged a cache in the original code and probably should have, but here we are, I got carried away. I do think this version is a bit easier to follow but I just wrote it, so YMMV)

This version of the module is significantly faster when using state: latest but only marginally faster when using state: present. (it is also a bit slower when listing a single pkg to install / upgrade since the cache/inventory is built anyway)
The performance difference is much more noticeable when a long list of package is used.

I opted to do away with the re module and parsing of pacman's installation/upgrade output which can change depending on the VerbosePkgsList option.
Instead the module will first call --sync/--upgrade --print-format [...], parse that with simple split and strip operations, and then proceed to do the actual installation.
I'm not overly fond of that approach, since it could introduce a time-to-check vs time-of-use problem, but I think this is better than trying to parse the output of pacman which can change depending on the user's settings.

Also, instead of doing each package operation with an individual call to pacman, there's now a single call to pacman with the full list of packages.
pacman is good at reporting errors for individual packages, and those are reported back to the user via stderr.
(or directly in msg if the package cannot be found)

Support for specifying package name in the format <repo>/<packagename> is also a little more robust I think.
(for example, community/elixir is correctly resolved to elixir but would install using pacman -S community/elixir)

Finally, I opted to keep the check mode and the "real mode" together instead of in distinct functions as the original module did. It felt like it would be easier to make sure the check mode was really doing what the real mode would be doing.

Anyways, I thought I might as well show the work now that it is done, even though the speedup in state: present is much less than when I was comparing to ansible without #3606

Oh one more thing, maybe I didn't know where to look since it is my first time looking at ansible code, but I couldn't find any test for the pacman module.
This new version includes as many tests as I could find for the original... :( There are now a bunch of tests for the module.

Performance

For a list of 36 pkgs, doing state: latest:

- name: list of updated pkgs
  become: yes
  pacman:
    state: latest
    name:
      - community/elixir
      - base-devel
      - bc
      - bcc-tools
      - bind
      - bzip2
      - coreutils
      - dstat
      - e2fsprogs
      - file
      - fwupd
      - gdb
      - git
      - gzip
      - htop
      - net-tools
      - openssh
      - perf
      - python
      - python-bcc
      - python-virtualenv
      - rsync
      - sshfs
      - strace
      - sudo
      - sysstat
      - tcpdump
      - throttled
      - tmux
      - unzip
      - util-linux
      - vim
      - wget
      - whois
      - wireguard-tools
      - zip

Old module:

$ time ansible -m include_tasks -a file=test_repo.yml localhost -D  -C
[WARNING]: No inventory was parsed, only implicit localhost is available
localhost | SUCCESS => {
    "changed": false,
    "include": "test_repo.yml",
    "include_args": {}
}
--- before: 
+++ after: installed
@@ -0,0 +1 @@
+community/elixir

localhost | CHANGED => {
    "changed": true,
    "msg": "1 package(s) would be latest"
}

real    0m5.162s
user    0m4.290s
sys     0m1.161s

New module

localhost | SUCCESS => {
    "changed": false,
    "include": "test_repo.yml",
    "include_args": {}
}
localhost | SUCCESS => {
    "changed": false,
    "msg": "package(s) already installed"
}

real    0m1.511s
user    0m1.388s
sys     0m0.184s

When using state: present the difference is much smaller: 0m1.671s vs 0m1.563s

Keep in mind however that the performance gets worse when more pkgs are in the list with the original module, whereas it stays mostly the same with this version (dict lookup vs calling out to pacman)

ISSUE TYPE
  • Feature Pull Request
COMPONENT NAME

pacman

@ansibullbot
Copy link
Collaborator

@ansibullbot ansibullbot added feature This issue/PR relates to a feature request module module new_contributor Help guide this first time contributor os packaging plugins plugin (any type) labels Dec 14, 2021
@ansibullbot

This comment has been minimized.

@ansibullbot ansibullbot added ci_verified Push fixes to PR branch to re-run CI needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR labels Dec 14, 2021
@jraby
Copy link
Contributor Author

jraby commented Dec 14, 2021

hmmm what is the best way to indicate that a variable isn't used if _ isn't allowed?
simply use some name and never refer to it again?

@felixfontein
Copy link
Collaborator

Simply use dummy instead of _. (See https://docs.ansible.com/ansible/latest/dev_guide/testing/sanity/no-underscore-variable.html for more info.)

@ansibullbot ansibullbot removed ci_verified Push fixes to PR branch to re-run CI needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR labels Dec 14, 2021
- Use a cache (or inventory) to speed up lookups of:
  - installed packages and groups
  - available packages and groups
  - upgradable packages
- Call pacman with the list of pkgs instead of one call per package (for
  installations, upgrades and removals)
- Use pacman [--sync|--upgrade] --print-format [...] to gather list of
  changes. Parsing that instead of the regular output of pacman, which
  is error prone and can be changed by user configuration.
  This can introduce a TOCTOU problem but unless something else calls
  pacman between the invocations, it shouldn't be a concern.
- Given the above, "check mode" code is within the function that would
  carry out the actual operation. This should make it harder for the
  check code and the "real code" to diverge.
- Support for specifying alternate package name formats is a bit more
  robust. pacman is used to extract the name of the package when the
  specified package is a file or a URL.
  The "<repo>/<pkgname>" format is also supported.

For "state: latest" with a list of ~35 pkgs, this module is about 5
times faster than the original.
@jraby
Copy link
Contributor Author

jraby commented Dec 14, 2021

thx, I updated the PR and squashed the WIP history.

@ansibullbot ansibullbot added the needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR label Dec 16, 2021
update_cache_extra_args handled as a list like the others
moved the module setup to its own function for easier testing
update and upgrade have no defaults (None) to let required_one_of() do
its job properly
Shift successful exit without name or upgrade under "update_cache".

It is an error if name or upgrade isn't specified and update_cache wasn't specified
either. (Caught by ansiblemodule required_one_of but still)
@felixfontein felixfontein added backport-4 check-before-release PR will be looked at again shortly before release and merged if possible. labels Dec 17, 2021
@felixfontein
Copy link
Collaborator

Please note that such rewrites are quite a lot of work to review, so don't be surprised if it takes some time before someone actually starts adding review comments here :)

@ansibullbot ansibullbot added new_plugin New plugin tests tests unit tests/unit labels Dec 18, 2021
@ansibullbot
Copy link
Collaborator

The test botmeta failed with 1 error:

.github/BOTMETA.yml:0:0: Author jraby not mentioned as active or inactive maintainer for plugins/modules/packaging/os/pacman.py (mentioned are: elasticdog, indrajitr, tchernomax, elasticdog)

The test ansible-test sanity --test yamllint [explain] failed with 2 errors:

changelogs/fragments/3907-pacman-speedup.yml:4:5: error: syntax error: could not find expected ':' (syntax)
changelogs/fragments/3907-pacman-speedup.yml:4:5: unparsable-with-libyaml: while scanning a simple key - could not find expected ':'

The test ansible-test sanity --test changelog [explain] failed with 1 error:

changelogs/fragments/3907-pacman-speedup.yml:0:0: yaml parsing error

The test ansible-test sanity --test yamllint [explain] failed with 2 errors:

changelogs/fragments/3907-pacman-speedup.yml:4:5: error: syntax error: could not find expected ':' (syntax)
changelogs/fragments/3907-pacman-speedup.yml:4:5: unparsable-with-libyaml: while scanning a simple key - could not find expected ':'

The test ansible-test sanity --test changelog [explain] failed with 1 error:

changelogs/fragments/3907-pacman-speedup.yml:0:0: yaml parsing error

The test ansible-test sanity --test yamllint [explain] failed with 2 errors:

changelogs/fragments/3907-pacman-speedup.yml:4:5: error: syntax error: could not find expected ':' (syntax)
changelogs/fragments/3907-pacman-speedup.yml:4:5: unparsable-with-libyaml: while scanning a simple key - could not find expected ':'

The test ansible-test sanity --test changelog [explain] failed with 1 error:

changelogs/fragments/3907-pacman-speedup.yml:0:0: yaml parsing error

The test ansible-test sanity --test yamllint [explain] failed with 2 errors:

changelogs/fragments/3907-pacman-speedup.yml:4:5: error: syntax error: could not find expected ':' (syntax)
changelogs/fragments/3907-pacman-speedup.yml:4:5: unparsable-with-libyaml: while scanning a simple key - could not find expected ':'

The test ansible-test sanity --test changelog [explain] failed with 1 error:

changelogs/fragments/3907-pacman-speedup.yml:0:0: yaml parsing error

The test ansible-test sanity --test yamllint [explain] failed with 1 error:

changelogs/fragments/3907-pacman-speedup.yml:4:5: error: syntax error: could not find expected ':'

click here for bot help

@ansibullbot ansibullbot added the ci_verified Push fixes to PR branch to re-run CI label Feb 7, 2022
@@ -19,6 +21,7 @@
- Indrajit Raychaudhuri (@indrajitr)
- Aaron Bull Schaefer (@elasticdog) <aaron@elasticdog.com>
- Maxime de Roucy (@tchernomax)
- Jean Raby (@jraby)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add yourself also to .github/BOTMETA.yml's maintainers list for the $modules/packaging/os/pacman.py entry.

plugins/modules/packaging/os/pacman.py Show resolved Hide resolved
changelogs/fragments/3907-pacman-speedup.yml Outdated Show resolved Hide resolved
jraby and others added 2 commits February 8, 2022 04:30
@ansibullbot ansibullbot added needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR and removed ci_verified Push fixes to PR branch to re-run CI needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR labels Feb 8, 2022
@felixfontein felixfontein removed the check-before-release PR will be looked at again shortly before release and merged if possible. label Feb 8, 2022
@felixfontein felixfontein merged commit 1580f3c into ansible-collections:main Feb 8, 2022
@patchback
Copy link

patchback bot commented Feb 8, 2022

Backport to stable-4: 💚 backport PR created

✅ Backport PR branch: patchback/backports/stable-4/1580f3c2b4876aeb06a0c785a14c5a4e057e843e/pr-3907

Backported as #4176

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

patchback bot pushed a commit that referenced this pull request Feb 8, 2022
…with a package list (#3907)

* pacman: rewrite with a cache to speed up execution

- Use a cache (or inventory) to speed up lookups of:
  - installed packages and groups
  - available packages and groups
  - upgradable packages
- Call pacman with the list of pkgs instead of one call per package (for
  installations, upgrades and removals)
- Use pacman [--sync|--upgrade] --print-format [...] to gather list of
  changes. Parsing that instead of the regular output of pacman, which
  is error prone and can be changed by user configuration.
  This can introduce a TOCTOU problem but unless something else calls
  pacman between the invocations, it shouldn't be a concern.
- Given the above, "check mode" code is within the function that would
  carry out the actual operation. This should make it harder for the
  check code and the "real code" to diverge.
- Support for specifying alternate package name formats is a bit more
  robust. pacman is used to extract the name of the package when the
  specified package is a file or a URL.
  The "<repo>/<pkgname>" format is also supported.

For "state: latest" with a list of ~35 pkgs, this module is about 5
times faster than the original.

* Let fail() actually work

* all unhappy paths now end up calling fail()

* Update copyright

* Argument changes

update_cache_extra_args handled as a list like the others
moved the module setup to its own function for easier testing
update and upgrade have no defaults (None) to let required_one_of() do
its job properly

* update_cache exit path

Shift successful exit without name or upgrade under "update_cache".

It is an error if name or upgrade isn't specified and update_cache wasn't specified
either. (Caught by ansiblemodule required_one_of but still)

* Add pkgs to output on success only

Also align both format, only pkg name for now

* Multiple fixes

Move VersionTuple to top level for import from tests
Add removed pkgs to the exit json when removing packages
fixup list of upgraded pkgs reported on upgrades (was tuple of list for
no reason)
use list idiom for upgrades, like the rest
drop unused expand_package_groups function
skip empty lines when building inventory

* pacman: add tests

* python 2.x compat + pep8

* python 2.x some more

* Fix failure when pacman emits warnings

Add tests covering that failure case

* typo

* Whitespace

black failed me...

* Adjust documentation to fit implicit defaults

* fix test failures on older pythons

* remove file not intended for commit

* Test exception str with e.match

* Build inventory after cache update + adjust tests

* Apply suggestions from code review

Co-authored-by: Felix Fontein <felix@fontein.de>

* Update plugins/modules/packaging/os/pacman.py

Co-authored-by: Felix Fontein <felix@fontein.de>

* changelog

* bump copyright year and add my name to authors

* Update changelogs/fragments/3907-pacman-speedup.yml

Co-authored-by: Felix Fontein <felix@fontein.de>

* maintainer entry

Co-authored-by: Felix Fontein <felix@fontein.de>
(cherry picked from commit 1580f3c)
@felixfontein
Copy link
Collaborator

@jraby thanks for implementing this!
@jamincollins thanks for reviewing and testing!

felixfontein pushed a commit that referenced this pull request Feb 9, 2022
…with a package list (#3907) (#4176)

* pacman: rewrite with a cache to speed up execution

- Use a cache (or inventory) to speed up lookups of:
  - installed packages and groups
  - available packages and groups
  - upgradable packages
- Call pacman with the list of pkgs instead of one call per package (for
  installations, upgrades and removals)
- Use pacman [--sync|--upgrade] --print-format [...] to gather list of
  changes. Parsing that instead of the regular output of pacman, which
  is error prone and can be changed by user configuration.
  This can introduce a TOCTOU problem but unless something else calls
  pacman between the invocations, it shouldn't be a concern.
- Given the above, "check mode" code is within the function that would
  carry out the actual operation. This should make it harder for the
  check code and the "real code" to diverge.
- Support for specifying alternate package name formats is a bit more
  robust. pacman is used to extract the name of the package when the
  specified package is a file or a URL.
  The "<repo>/<pkgname>" format is also supported.

For "state: latest" with a list of ~35 pkgs, this module is about 5
times faster than the original.

* Let fail() actually work

* all unhappy paths now end up calling fail()

* Update copyright

* Argument changes

update_cache_extra_args handled as a list like the others
moved the module setup to its own function for easier testing
update and upgrade have no defaults (None) to let required_one_of() do
its job properly

* update_cache exit path

Shift successful exit without name or upgrade under "update_cache".

It is an error if name or upgrade isn't specified and update_cache wasn't specified
either. (Caught by ansiblemodule required_one_of but still)

* Add pkgs to output on success only

Also align both format, only pkg name for now

* Multiple fixes

Move VersionTuple to top level for import from tests
Add removed pkgs to the exit json when removing packages
fixup list of upgraded pkgs reported on upgrades (was tuple of list for
no reason)
use list idiom for upgrades, like the rest
drop unused expand_package_groups function
skip empty lines when building inventory

* pacman: add tests

* python 2.x compat + pep8

* python 2.x some more

* Fix failure when pacman emits warnings

Add tests covering that failure case

* typo

* Whitespace

black failed me...

* Adjust documentation to fit implicit defaults

* fix test failures on older pythons

* remove file not intended for commit

* Test exception str with e.match

* Build inventory after cache update + adjust tests

* Apply suggestions from code review

Co-authored-by: Felix Fontein <felix@fontein.de>

* Update plugins/modules/packaging/os/pacman.py

Co-authored-by: Felix Fontein <felix@fontein.de>

* changelog

* bump copyright year and add my name to authors

* Update changelogs/fragments/3907-pacman-speedup.yml

Co-authored-by: Felix Fontein <felix@fontein.de>

* maintainer entry

Co-authored-by: Felix Fontein <felix@fontein.de>
(cherry picked from commit 1580f3c)

Co-authored-by: Jean Raby <jean@raby.sh>
@felixfontein
Copy link
Collaborator

In 1580f3c#commitcomment-67899699 someone commented that this PR uses --query --group instead of --query --groups, breaking some AUR helpers (like yay or paru). Both pacman --query --group and pacman --query --groups seem to return the same results (at least for my system), and pacman --query --help only mentions -g and --groups. I guess changing --group to --groups makes sense. Or is there something I'm missing @jraby?

@jraby
Copy link
Contributor Author

jraby commented Mar 3, 2022

wait, why does it even work with pacman? ....
We can definitely change it to the actual option --groups.

have you started doing it or do you want me to prepare a PR?

@felixfontein
Copy link
Collaborator

Interestingly, --gr, --gro, --grou also work for --groups. --g does not though.

@felixfontein
Copy link
Collaborator

I haven't started a PR, feel free to create one :)

@jraby
Copy link
Contributor Author

jraby commented Mar 3, 2022

alright. I'll do that later today.

felixfontein added a commit that referenced this pull request Mar 6, 2022
* pacman: don't always return changed w/ update_cache

This used to be the behavior before the recent refactoring. [1]

Allows the following to return changed only when packages were upgraded:

  - pacman:
    update_cache: yes
    upgrade: yes

And the following to return changed only when the foo package wasn't at
the latest version:

  - pacman:
    name: foo
    state: latest
    update_cache: yes

[1] #3907

* Update changelogs/fragments/4318-pacman-restore-old-changed-behavior.yml

Co-authored-by: Felix Fontein <felix@fontein.de>

Co-authored-by: Felix Fontein <felix@fontein.de>
patchback bot pushed a commit that referenced this pull request Mar 6, 2022
* pacman: don't always return changed w/ update_cache

This used to be the behavior before the recent refactoring. [1]

Allows the following to return changed only when packages were upgraded:

  - pacman:
    update_cache: yes
    upgrade: yes

And the following to return changed only when the foo package wasn't at
the latest version:

  - pacman:
    name: foo
    state: latest
    update_cache: yes

[1] #3907

* Update changelogs/fragments/4318-pacman-restore-old-changed-behavior.yml

Co-authored-by: Felix Fontein <felix@fontein.de>

Co-authored-by: Felix Fontein <felix@fontein.de>
(cherry picked from commit cc8151f)
felixfontein pushed a commit that referenced this pull request Mar 6, 2022
* pacman: don't always return changed w/ update_cache

This used to be the behavior before the recent refactoring. [1]

Allows the following to return changed only when packages were upgraded:

  - pacman:
    update_cache: yes
    upgrade: yes

And the following to return changed only when the foo package wasn't at
the latest version:

  - pacman:
    name: foo
    state: latest
    update_cache: yes

[1] #3907

* Update changelogs/fragments/4318-pacman-restore-old-changed-behavior.yml

Co-authored-by: Felix Fontein <felix@fontein.de>

Co-authored-by: Felix Fontein <felix@fontein.de>
(cherry picked from commit cc8151f)

Co-authored-by: Evangelos Foutras <evangelos@foutrelis.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature This issue/PR relates to a feature request has_issue module module needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR new_contributor Help guide this first time contributor new_plugin New plugin os packaging plugins plugin (any type) tests tests unit tests/unit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants