Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(service): service salt-master and salt-minion to restart last #482

Merged

Conversation

remichristiaan
Copy link

@remichristiaan remichristiaan commented Jul 27, 2020

When running a high-state on the salt-master to deploy itself, the run fails
with an Authentication error occurred because the master restarts half way though.

PR progress checklist (to be filled in by reviewers)

  • Changes to documentation are appropriate (or tick if not required)
  • Changes to tests are appropriate (or tick if not required)
  • Reviews completed

What type of PR is this?

Primary type

  • [build] Changes related to the build system
  • [chore] Changes to the build process or auxiliary tools and libraries such as documentation generation
  • [ci] Changes to the continuous integration configuration
  • [feat] A new feature
  • [fix] A bug fix
  • [perf] A code change that improves performance
  • [refactor] A code change that neither fixes a bug nor adds a feature
  • [revert] A change used to revert a previous commit
  • [style] Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc.)

Secondary type

  • [docs] Documentation changes
  • [test] Adding missing or correcting existing tests

Does this PR introduce a BREAKING CHANGE?

No.

Related issues and/or pull requests

Describe the changes you're proposing

Pillar / config required to test the proposed changes

Debug log showing how the proposed changes work

Documentation checklist

  • Updated the README (e.g. Available states).
  • Updated pillar.example.

Testing checklist

  • Included in Kitchen (i.e. under state_top).
  • Covered by new/existing tests (e.g. InSpec, Serverspec, etc.).
  • Updated the relevant test pillar.

Additional context

When running a high-state on the salt-master to deploy itself, the run fails
with an Authentication error occurred because the master restarts half way though.
@remichristiaan remichristiaan requested a review from a team as a code owner July 27, 2020 13:12
@pull-assistant
Copy link

Score: 1.00

Best reviewed: commit by commit


Optimal code review plan

     fix(service): service salt-master and salt-minion to restart last

Powered by Pull Assistant. Last update d71cf0c ... d71cf0c. Read the comment docs.

@baby-gnu
Copy link
Contributor

Hello @remichristiaan.

Do you have debug outputs of the problems and debug outputs with the fixes?

I'm wondering if the tests could trigger a master/minion restart to check this…

@n-rodriguez
Copy link
Member

n-rodriguez commented Jul 27, 2020

I'm wondering if the tests could trigger a master/minion restart to check this…

I think you could by using Inspec remote command module (https://docs.chef.io/inspec/resources/command/) ... but it would be very tricky...

@n-rodriguez n-rodriguez requested a review from myii July 27, 2020 15:58
@baby-gnu
Copy link
Contributor

I think you could by using Inspec remote command module ... but it would be very tricky...

I was more thinking about adding some pillars for the formula to do a restart of the service but the configuration files are copied using a file.recurse so instead the kitchen should create/copy some configuration (master.d / minion.d) in the file_roots for the formula to process the service restart.

@remichristiaan
Copy link
Author

I do not have debug outputs unfortunately ... I ran into this problem on the go and tried a fix that seems to help in 8 out of 10 runs which probably gets caused by a requisite declaration that overrides the order option. So this needs a bit more digging, but then again it might be an edge case that I ran into.

Similar problem with the openvpn-formula ... if the vpn restarts before the states could all finish, it stops in mid-way.

@myii
Copy link
Member

myii commented Jul 29, 2020

Thanks for this PR, @remichristiaan. We discussed it briefly in the Formulas' working group meeting yesterday and @baby-gnu mentioned that he will check what can be done with Kitchen first.

@baby-gnu
Copy link
Contributor

Sorry, I could only do manual testing since I don't find a way to run the formula twice in a kitchen run.

  1. run ./bin/kitchen converge v3001-py3-debian-10-3001-py3
  2. modify kitchen.yaml
    --- kitchen.yml.orig	2020-07-31 12:12:45.572817927 +0200
    +++ kitchen.yml	2020-07-31 12:14:45.640441305 +0200
    @@ -148,6 +148,9 @@
       salt_copy_filter:
         - .kitchen
         - .git
    +  init_environment: |
    +    echo 'log_level: debug' > /tmp/kitchen/srv/salt/salt/files/master.d/log-level.conf
    +    echo 'log_level: debug' > /tmp/kitchen/srv/salt/salt/files/minion.d/log-level.conf
     
     verifier:
       # https://www.inspec.io/
  3. run again ./bin/kitchen converge v3001-py3-debian-10-3001-py3

Without this patch, this results in the following:

local:
  Name: deb http://repo.saltstack.com/py3/debian/10/amd64/3001 buster main - Function: pkgrepo.managed - Result: Clean Started: - 10:17:02.405550 Duration: 33.394 ms
  Name: salt-master - Function: pkg.installed - Result: Clean Started: - 10:17:03.390722 Duration: 23.288 ms
----------
          ID: salt-master
    Function: file.recurse
        Name: /etc/salt/master.d
      Result: True
     Comment: Recursively updated /etc/salt/master.d
     Started: 10:17:03.418373
    Duration: 263.417 ms
     Changes:   
       ----------
       /etc/salt/master.d/log-level.conf:
           ----------
           diff:
               New file
           mode:
               0644
  Name: /etc/salt/master.d/_defaults.conf - Function: file.absent - Result: Clean Started: - 10:17:03.695305 Duration: 1.246 ms
----------
          ID: salt-master
    Function: service.running
      Result: True
     Comment: Service restarted
     Started: 10:17:03.731467
    Duration: 1510.131 ms
     Changes:   
       ----------
       salt-master:
           True
  Name: salt-minion - Function: pkg.installed - Result: Clean Started: - 10:17:05.244984 Duration: 10.765 ms
----------
          ID: salt-minion
    Function: file.recurse
        Name: /etc/salt/minion.d
      Result: True
     Comment: Recursively updated /etc/salt/minion.d
     Started: 10:17:05.255942
    Duration: 234.212 ms
     Changes:   
       ----------
       /etc/salt/minion.d/log-level.conf:
           ----------
           diff:
               New file
           mode:
               0644
  Name: /etc/salt/minion.d/_defaults.conf - Function: file.absent - Result: Clean Started: - 10:17:05.503972 Duration: 0.826 ms
  Name: salt-minion - Function: service.running - Result: Clean Started: - 10:17:05.505057 Duration: 31.264 ms
----------
          ID: salt-minion
    Function: cmd.run
        Name: salt-call --local service.restart salt-minion --out-file /dev/null
      Result: True
     Comment: Command "salt-call --local service.restart salt-minion --out-file /dev/null" run
     Started: 10:17:05.539735
    Duration: 5.567 ms
     Changes:   
       ----------
       pid:
           3256
       retcode:
           None
       stderr:
       stdout:

Summary for local
-------------
Succeeded: 10 (changed=4)
Failed:     0
-------------
Total states run:     10
Total run time:    2.114 s
Downloading files from <v3001-py3-debian-10-3001-py3>
Finished converging <v3001-py3-debian-10-3001-py3> (0m9.78s).

So, the master is restarted just after it's configuration is modified and the minion states are done after.

With the patch, the services are restarted after all configuration are done:

local:
  Name: deb http://repo.saltstack.com/py3/debian/10/amd64/3001 buster main - Function: pkgrepo.managed - Result: Clean Started: - 10:38:25.283063 Duration: 32.998 ms
  Name: salt-master - Function: pkg.installed - Result: Clean Started: - 10:38:26.270636 Duration: 23.125 ms
----------
          ID: salt-master
    Function: file.recurse
        Name: /etc/salt/master.d
      Result: True
     Comment: Recursively updated /etc/salt/master.d
     Started: 10:38:26.298228
    Duration: 265.613 ms
     Changes:   
       ----------
       /etc/salt/master.d/log-level.conf:
           ----------
           diff:
               New file
           mode:
               0644
  Name: /etc/salt/master.d/_defaults.conf - Function: file.absent - Result: Clean Started: - 10:38:26.566110 Duration: 0.72 ms
  Name: salt-minion - Function: pkg.installed - Result: Clean Started: - 10:38:26.572119 Duration: 819.962 ms
----------
          ID: salt-minion
    Function: file.recurse
        Name: /etc/salt/minion.d
      Result: True
     Comment: Recursively updated /etc/salt/minion.d
     Started: 10:38:27.392369
    Duration: 217.93 ms
     Changes:   
       ----------
       /etc/salt/minion.d/log-level.conf:
           ----------
           diff:
               New file
           mode:
               0644
  Name: /etc/salt/minion.d/_defaults.conf - Function: file.absent - Result: Clean Started: - 10:38:27.614029 Duration: 0.742 ms
----------
          ID: salt-minion
    Function: cmd.run
        Name: salt-call --local service.restart salt-minion --out-file /dev/null
      Result: True
     Comment: Command "salt-call --local service.restart salt-minion --out-file /dev/null" run
     Started: 10:38:27.615051
    Duration: 7.44 ms
     Changes:   
       ----------
       pid:
           2361
       retcode:
           None
       stderr:
       stdout:
----------
          ID: salt-master
    Function: service.running
      Result: True
     Comment: Service restarted
     Started: 10:38:27.673946
    Duration: 2606.813 ms
     Changes:   
       ----------
       salt-master:
           True
----------
          ID: salt-minion
    Function: service.running
      Result: True
     Comment: Service salt-minion is already enabled, and is running
     Started: 10:38:30.281587
    Duration: 241.509 ms
     Changes:   
       ----------
       salt-minion:
           True

Summary for local
-------------
Succeeded: 10 (changed=5)
Failed:     0
-------------
Total states run:     10
Total run time:    4.217 s
Downloading files from <v3001-py3-debian-10-3001-py3>
Finished converging <v3001-py3-debian-10-3001-py3> (0m11.84s).

@baby-gnu
Copy link
Contributor

So, I can't find a way to automate the testing. Sorry.

@n-rodriguez
Copy link
Member

Sorry, I could only do manual testing since I don't find a way to run the formula twice in a kitchen run.

That why I thought it could be done with Inspec.

@daks
Copy link
Member

daks commented Aug 7, 2020

I don't think it's possible to run something (highstate or formula depending of the specific config) twice if it succeeds. If it fails, it's possible using max_retries.

@myii myii merged commit aecf174 into saltstack-formulas:master Aug 25, 2020
@myii
Copy link
Member

myii commented Aug 25, 2020

Thanks for the patience, @remichristiaan -- merged based on approving reviews (x2).

@saltstack-formulas-travis

🎉 This PR is included in version 1.5.1 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants