[BUG] file.recurse is > 5x slower than file.managed #61998

ManofWax · 2022-04-28T16:12:07Z

Description
file.recurse is > 5x times slower than file.managed even if the directory contains only one file.

Setup

Create this state file:

file_managed_test:
    file.managed:
        - source: salt://mypath/etc/systemd/resolved.conf
        - name: /tmp/test1.txt

file_recurse_test:
    file.recurse:
        - source: salt://sensor/etc/systemd
        - name: /tmp/testfolder

The directory used as source contains only one file:

#~/mypath/etc/systemd$ ls -l
-rw-rw-r-- 1 user1 user1 350 nov 12 11:30 resolved.conf

Steps to Reproduce the behavior

Run the state file:

$ sudo salt myserver.local state.apply sensor.file_recurse_test
myserver.local:
----------
          ID: file_managed_test
    Function: file.managed
        Name: /tmp/test1.txt
      Result: True
     Comment: File /tmp/test1.txt is in the correct state
     Started: 16:00:18.816785
    Duration: 256.539 ms
     Changes:   
----------
          ID: file_recurse_test
    Function: file.recurse
        Name: /tmp/testfolder
      Result: True
     Comment: The directory /tmp/testfolder is in the correct state
     Started: 16:00:19.073618
    Duration: 1564.548 ms
     Changes:   

Summary for myserver.local
------------
Succeeded: 2
Failed:    0
------------
Total states run:     2
Total run time:   1.821 s

file.recurse is 6 times slower than file.managed

Expected behavior

file.recurse speed should be comparable to file.managed if the number of files involved are the same

Versions Report
This is the minion salt version:

# salt-minion --versions
Salt Version:
          Salt: 3004.1
 
Dependency Versions:
          cffi: Not Installed
      cherrypy: Not Installed
      dateutil: 2.7.3
     docker-py: 4.1.0
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 2.10.1
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 0.6.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: Not Installed
      pycrypto: Not Installed
  pycryptodome: 3.6.1
        pygit2: Not Installed
        Python: 3.8.10 (default, Mar 15 2022, 12:22:08)
  python-gnupg: 0.4.5
        PyYAML: 5.3.1
         PyZMQ: 18.1.1
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.2
 
System Versions:
          dist: ubuntu 20.04 focal
        locale: utf-8
       machine: x86_64
       release: 5.4.0-107-generic
        system: Linux
       version: Ubuntu 20.04 focal

This is the master version:

$ salt-master --versions
Salt Version:
          Salt: 3004.1
 
Dependency Versions:
          cffi: Not Installed
      cherrypy: Not Installed
      dateutil: 2.7.3
     docker-py: Not Installed
         gitdb: 2.0.6
     gitpython: 3.0.7
        Jinja2: 2.11.2
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 0.6.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: Not Installed
      pycrypto: Not Installed
  pycryptodome: 3.6.1
        pygit2: Not Installed
        Python: 3.8.10 (default, Mar 15 2022, 12:22:08)
  python-gnupg: 0.4.5
        PyYAML: 5.3.1
         PyZMQ: 18.1.1
         smmap: 2.0.5
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.2
 
System Versions:
          dist: ubuntu 20.04 focal
        locale: utf-8
       machine: x86_64
       release: 5.4.0-107-generic
        system: Linux
       version: Ubuntu 20.04 focal

The text was updated successfully, but these errors were encountered:

welcome · 2022-04-28T16:12:09Z

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey.
Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar.
If you have additional questions, email us at saltproject@vmware.com. We’re glad you’ve joined our community and look forward to doing awesome things with you!

whytewolf · 2022-12-22T22:27:45Z

I am unable to recreate this.

root@salt00:/srv/salt/tests# salt salt00 state.apply tests.file_test
salt00:
----------
          ID: test_file_manage
    Function: file.managed
        Name: /tmp/test.txt
      Result: True
     Comment: File /tmp/test.txt is in the correct state
     Started: 22:25:16.901869
    Duration: 33.125 ms
     Changes:
----------
          ID: test_file_recurse
    Function: file.recurse
        Name: /tmp/testdir
      Result: True
     Comment: The directory /tmp/testdir is in the correct state
     Started: 22:25:16.935185
    Duration: 44.001 ms
     Changes:

Summary for salt00
------------
Succeeded: 2
Failed:    0
------------
Total states run:     2
Total run time:  77.126 ms

as you can see in my test they are similar times. and both much faster than yours. which makes me think something else has to be going on here. do you have a large number of files in your salt file server? or a large number of directories?

ManofWax · 2022-12-23T11:53:52Z

Yes, /srv/ (the salt directory on salt-master) is very large:

/srv (master*) # rsync --stats --dry-run -ax /srv /tmp        

Number of files: 30,949 (reg: 20,900, dir: 10,042, link: 7)

It contains 31,000 files. How does it matter for file.recurse speed?

whytewolf · 2022-12-23T16:12:14Z

the filesystem in salt is not exactly robust. it isn't meant to be large. if it isn't being served by salt it shouldn't be in the fileserver. a large fileserver can cause the master to slowdown as it gets requests. but also file.recurse scans the list of directories looking for the target directory. this normally is pretty fast. however with a large number of directories it can slow it down.

sevdog · 2022-12-23T16:29:40Z

Is this the cause of such issue?

salt/salt/states/file.py

Lines 4477 to 4485 in 31aee29

    
           # Check source path relative to fileserver root, make sure it is a 
        
           # directory 
        
           srcpath, senv = salt.utils.url.parse(source) 
        
           if senv is None: 
        
               senv = __env__ 
        
           master_dirs = __salt__["cp.list_master_dirs"](saltenv=senv) 
        
           if srcpath not in master_dirs and not any( 
        
               x for x in master_dirs if x.startswith(srcpath + "/") 
        
           ):

whytewolf · 2022-12-23T16:38:56Z

that is only half of it. the other half being the transfer of the data needed. and the slowdown on the master as it indexes the filesystem. with the extra calls needed for file.recurse it will still be slower for file.recurse than file.managed.

ManofWax · 2022-12-23T16:47:00Z

Is there another way to implement it? Maybe store file content inside pillars?

whytewolf · 2022-12-23T16:54:56Z

pillars would be worse. pillars are not meant for anything other than secrets.

the recommendation is to stop abusing it. if you need large swaths of files put somewhere use rsync state or compress it and use the archive state.

sevdog · 2022-12-23T17:00:42Z

How about avoid a doubple-loop and use a single any statement here:

-    if srcpath not in master_dirs and not any(
-        x for x in master_dirs if x.startswith(srcpath + "/")
+    if not any(
+        x for x in master_dirs if x.startswith(srcpath + "/") or x == srcpath
    ):

This could at least save some time when the list is big.

whytewolf · 2022-12-23T17:32:07Z

can also save on the size of the loop by adding a prefix to the cp.list_master_dirs but in the end that will only help marginly. a slow master will still slowdown as it builds the file index to send to the state. which is where most of the slowdown happens. and since file.managed doesn't need to know about the files in the file server but file.recurse does file.recurse will ALWAYS be slower than file.managed.

damon-atkins · 2022-12-24T04:34:09Z

Last time I look at this, getting meta data from the master about files is not every efficient. It does cache a little bit.

Ch3LL · 2023-08-16T13:27:55Z

Closed by #64977

ManofWax added Bug broken, incorrect, or confusing behavior needs-triage labels Apr 28, 2022

OrangeDog added the Performance label May 5, 2022

garethgreenaway assigned whytewolf Dec 20, 2022

whytewolf removed the needs-triage label Jun 30, 2023

whytewolf added this to the Sulfur v3006.2 milestone Jun 30, 2023

whytewolf mentioned this issue Aug 14, 2023

[3006.x] attempt file.recurse speed up #64977

Merged

3 tasks

whytewolf linked a pull request Aug 15, 2023 that will close this issue

[3006.x] attempt file.recurse speed up #64977

Merged

3 tasks

Ch3LL closed this as completed Aug 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] file.recurse is > 5x slower than file.managed #61998

[BUG] file.recurse is > 5x slower than file.managed #61998

ManofWax commented Apr 28, 2022

welcome bot commented Apr 28, 2022

whytewolf commented Dec 22, 2022

ManofWax commented Dec 23, 2022

whytewolf commented Dec 23, 2022

sevdog commented Dec 23, 2022

whytewolf commented Dec 23, 2022

ManofWax commented Dec 23, 2022

whytewolf commented Dec 23, 2022

sevdog commented Dec 23, 2022 •

edited

Loading

whytewolf commented Dec 23, 2022

damon-atkins commented Dec 24, 2022

Ch3LL commented Aug 16, 2023

[BUG] file.recurse is > 5x slower than file.managed #61998

[BUG] file.recurse is > 5x slower than file.managed #61998

Comments

ManofWax commented Apr 28, 2022

welcome bot commented Apr 28, 2022

whytewolf commented Dec 22, 2022

ManofWax commented Dec 23, 2022

whytewolf commented Dec 23, 2022

sevdog commented Dec 23, 2022

whytewolf commented Dec 23, 2022

ManofWax commented Dec 23, 2022

whytewolf commented Dec 23, 2022

sevdog commented Dec 23, 2022 • edited Loading

whytewolf commented Dec 23, 2022

damon-atkins commented Dec 24, 2022

Ch3LL commented Aug 16, 2023

sevdog commented Dec 23, 2022 •

edited

Loading