-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separated sort and head commands to prevent broken pipe issues #19699
Merged
lguohan
merged 3 commits into
sonic-net:master
from
assrinivasan:logrotate-prevent-sigpipe
Jul 26, 2024
Merged
Separated sort and head commands to prevent broken pipe issues #19699
lguohan
merged 3 commits into
sonic-net:master
from
assrinivasan:logrotate-prevent-sigpipe
Jul 26, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
saiarcot895
approved these changes
Jul 25, 2024
prgeor
reviewed
Jul 25, 2024
saiarcot895
approved these changes
Jul 26, 2024
prgeor
approved these changes
Jul 26, 2024
gechiang
approved these changes
Jul 26, 2024
Picked into MSFT repo 202205 branch: |
liushilongbuaa
pushed a commit
to liushilongbuaa/sonic-buildimage
that referenced
this pull request
Aug 1, 2024
…issues (sonic-net#19699) Why I did it Currently the logrotate script is susceptible to SIGPIPE issue because of a race condition between the sort and head commands that leads to the following error: Jun 22 03:20:00.331117 sonic-switch INFO logrotate[1494655]: sort: write failed: 'standard output': Broken pipe Jun 22 03:20:00.331207 sonic-switch INFO logrotate[1494655]: sort: write error This is because if the sort command has not yet finished writing to stdout and head has finished executing and closes the pipe. Subsequent writes by the previously issued sort command would fail because of the closed pipe. Work item tracking Microsoft ADO (number only): 28511764 How I did it I refactored the code in a way that removes the call to head after sort. How to verify it I created a bash script that induces a write failed: Broken pipe error on account of a sort | head combination and verified the SIGPIPE error code. I then removed the head command and replaced it with a call to awk and re-ran the script. This time there was no SIGPIPE. The script: #!/usr/bin/bash var0=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | head -n 20 > /dev/null') echo "" echo "Removed head and modified awk:" echo "" var1=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | awk "NR == 1 {print $2}"') The output execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7ffc34b52f98 /* 17 vars */) = 0 strace: Process 137372 attached strace: Process 137373 attached strace: Process 137374 attached [pid 137373] execve("/usr/bin/sort", ["sort"], 0x55be58b00350 /* 17 vars */ <unfinished ...> [pid 137372] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55be58b00350 /* 17 vars */ <unfinished ...> [pid 137373] <... execve resumed>) = 0 [pid 137372] <... execve resumed>) = 0 [pid 137374] execve("/usr/bin/head", ["head", "-n", "20"], 0x55be58b00350 /* 17 vars */) = 0 [pid 137372] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096 [pid 137372] write(1, " 30960 Jan 20 2022 colrm\n-r"..., 4096) = 4096 [pid 137372] write(1, "-xr-x 1 root root 1866 De"..., 4096) = 4096 [pid 137372] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137372] write(1, " 1 root root 315136 Oct 2 "..., 4096) = 4096 [pid 137372] write(1, "ewgrp\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137372] write(1, "-rwxr-xr-x 1 root root 43"..., 4096) = 4096 [pid 137372] write(1, "eb 27 2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096 [pid 137372] write(1, "672 Jun 18 2023 systemd-ask-pas"..., 4096) = 4096 [pid 137372] write(1, "zselect\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137372] write(1, "r-xr-x 1 root root 2081 A"..., 4096) = 4096 [pid 137372] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533 [pid 137372] +++ exited with 0 +++ [pid 137373] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 4096) = 4096 [pid 137373] write(1, "xrwx 1 root root 12 Jul"..., 4096) = 4096 [pid 137374] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 942 <unfinished ...> [pid 137373] write(1, "Feb 2 2021 ping6 -> ping\nlrwxr"..., 4096 <unfinished ...> [pid 137374] <... write resumed>) = 942 [pid 137373] <... write resumed>) = -1 EPIPE (Broken pipe) [pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=137373, si_uid=1000} --- [pid 137374] +++ exited with 0 +++ [pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_TKILL, si_pid=137373, si_uid=1000} --- [pid 137373] +++ killed by SIGPIPE +++ --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137372, si_uid=1000, si_status=0, si_utime=0, si_stime=4} --- +++ exited with 0 +++ Removed head and modified awk: execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7fff94a24d18 /* 17 vars */) = 0 strace: Process 137379 attached strace: Process 137380 attached strace: Process 137381 attached [pid 137379] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55d437568350 /* 17 vars */ <unfinished ...> [pid 137380] execve("/usr/bin/sort", ["sort"], 0x55d437568350 /* 17 vars */ <unfinished ...> [pid 137379] <... execve resumed>) = 0 [pid 137380] <... execve resumed>) = 0 [pid 137381] execve("/usr/bin/awk", ["awk", "NR == 1 {print }"], 0x55d437568350 /* 17 vars */) = 0 [pid 137379] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096 [pid 137379] write(1, " 30960 Jan 20 2022 colrm\n-r"..., 4096) = 4096 [pid 137379] write(1, "-xr-x 1 root root 1866 De"..., 4096) = 4096 [pid 137379] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137379] write(1, " 1 root root 315136 Oct 2 "..., 4096) = 4096 [pid 137379] write(1, "ewgrp\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137379] write(1, "-rwxr-xr-x 1 root root 43"..., 4096) = 4096 [pid 137379] write(1, "eb 27 2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096 [pid 137379] write(1, "672 Jun 18 2023 systemd-ask-pas"..., 4096) = 4096 [pid 137379] write(1, "zselect\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137379] write(1, "r-xr-x 1 root root 2081 A"..., 4096) = 4096 [pid 137379] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533 [pid 137379] +++ exited with 0 +++ [pid 137380] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 4096) = 4096 [pid 137380] write(1, "xrwx 1 root root 12 Jul"..., 4096) = 4096 [pid 137380] write(1, "Feb 2 2021 ping6 -> ping\nlrwxr"..., 4096) = 4096 [pid 137380] write(1, "t 10803 Sep 24 2021 pod2t"..., 4096) = 4096 [pid 137380] write(1, "22 colcrt\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137380] write(1, " 2022 zmore\n-rwxr-xr-x 1 root r"..., 4096) = 4096 [pid 137380] write(1, "6 Jun 19 08:47 TS\n-rwxr-xr-x 1 r"..., 4096) = 4096 [pid 137380] write(1, "oot 35048 Jan 20 2022 tas"..., 4096) = 4096 [pid 137380] write(1, " 4392 Sep 24 2021 ptargrep\n"..., 4096) = 4096 [pid 137380] write(1, "-xr-x 1 root root 56192 Se"..., 4096) = 4096 [pid 137380] write(1, " 7285 Nov 24 2016 boot-eos\n-"..., 4096) = 4096 [pid 137380] write(1, "ostname-config.sh\n-rwxr-xr-x 2 r"..., 533) = 533 [pid 137381] write(1, "\n", 1) = 1 [pid 137380] +++ exited with 0 +++ [pid 137381] +++ exited with 0 +++ --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137379, si_uid=1000, si_status=0, si_utime=1, si_stime=5} --- +++ exited with 0 +++ I created 30 dummy archive files in a dev environment and then ran the modified script to verify that the OLDEST_ARCHIVE_FILE is the archive file that was created first: The setup: admin@sonic:~/testpipe/var/log$ pwd /home/admin/testpipe/var/log admin@sonic:~/testpipe/var/log$ ls -alrth total 8.0K drwxr-xr-x 3 admin admin 4.0K Jul 25 17:12 .. -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.1.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.2.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.3.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.5.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.4.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.6.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.7.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.9.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.8.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.10.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.11.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.12.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.13.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.14.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.15.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.16.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.17.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.18.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.19.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.20.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.21.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.22.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.23.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.24.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.25.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.26.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.27.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.28.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.29.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.30.gz drwxr-xr-x 2 admin admin 4.0K Jul 25 17:17 . The script: admin@sonic:~/testpipe$ cat sort_test.sh #!/usr/bin/bash SORTED_ARCHIVE_FILES=$(find ./var/log -type f -printf '%T+ %p\n' | grep -E '.+\.[0-9]+(\.gz)?$' | sort ) OLDEST_ARCHIVE_FILE=$(echo "${SORTED_ARCHIVE_FILES}" | awk 'NR == 1 { print $2; }') echo $OLDEST_ARCHIVE_FILE The output: admin@sonic:~/testpipe$ ./sort_test.sh ./var/log/syslog.1.gz
mssonicbld
pushed a commit
to mssonicbld/sonic-buildimage
that referenced
this pull request
Aug 2, 2024
…issues (sonic-net#19699) Why I did it Currently the logrotate script is susceptible to SIGPIPE issue because of a race condition between the sort and head commands that leads to the following error: Jun 22 03:20:00.331117 sonic-switch INFO logrotate[1494655]: sort: write failed: 'standard output': Broken pipe Jun 22 03:20:00.331207 sonic-switch INFO logrotate[1494655]: sort: write error This is because if the sort command has not yet finished writing to stdout and head has finished executing and closes the pipe. Subsequent writes by the previously issued sort command would fail because of the closed pipe. Work item tracking Microsoft ADO (number only): 28511764 How I did it I refactored the code in a way that removes the call to head after sort. How to verify it I created a bash script that induces a write failed: Broken pipe error on account of a sort | head combination and verified the SIGPIPE error code. I then removed the head command and replaced it with a call to awk and re-ran the script. This time there was no SIGPIPE. The script: #!/usr/bin/bash var0=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | head -n 20 > /dev/null') echo "" echo "Removed head and modified awk:" echo "" var1=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | awk "NR == 1 {print $2}"') The output execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7ffc34b52f98 /* 17 vars */) = 0 strace: Process 137372 attached strace: Process 137373 attached strace: Process 137374 attached [pid 137373] execve("/usr/bin/sort", ["sort"], 0x55be58b00350 /* 17 vars */ <unfinished ...> [pid 137372] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55be58b00350 /* 17 vars */ <unfinished ...> [pid 137373] <... execve resumed>) = 0 [pid 137372] <... execve resumed>) = 0 [pid 137374] execve("/usr/bin/head", ["head", "-n", "20"], 0x55be58b00350 /* 17 vars */) = 0 [pid 137372] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096 [pid 137372] write(1, " 30960 Jan 20 2022 colrm\n-r"..., 4096) = 4096 [pid 137372] write(1, "-xr-x 1 root root 1866 De"..., 4096) = 4096 [pid 137372] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137372] write(1, " 1 root root 315136 Oct 2 "..., 4096) = 4096 [pid 137372] write(1, "ewgrp\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137372] write(1, "-rwxr-xr-x 1 root root 43"..., 4096) = 4096 [pid 137372] write(1, "eb 27 2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096 [pid 137372] write(1, "672 Jun 18 2023 systemd-ask-pas"..., 4096) = 4096 [pid 137372] write(1, "zselect\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137372] write(1, "r-xr-x 1 root root 2081 A"..., 4096) = 4096 [pid 137372] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533 [pid 137372] +++ exited with 0 +++ [pid 137373] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 4096) = 4096 [pid 137373] write(1, "xrwx 1 root root 12 Jul"..., 4096) = 4096 [pid 137374] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 942 <unfinished ...> [pid 137373] write(1, "Feb 2 2021 ping6 -> ping\nlrwxr"..., 4096 <unfinished ...> [pid 137374] <... write resumed>) = 942 [pid 137373] <... write resumed>) = -1 EPIPE (Broken pipe) [pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=137373, si_uid=1000} --- [pid 137374] +++ exited with 0 +++ [pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_TKILL, si_pid=137373, si_uid=1000} --- [pid 137373] +++ killed by SIGPIPE +++ --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137372, si_uid=1000, si_status=0, si_utime=0, si_stime=4} --- +++ exited with 0 +++ Removed head and modified awk: execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7fff94a24d18 /* 17 vars */) = 0 strace: Process 137379 attached strace: Process 137380 attached strace: Process 137381 attached [pid 137379] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55d437568350 /* 17 vars */ <unfinished ...> [pid 137380] execve("/usr/bin/sort", ["sort"], 0x55d437568350 /* 17 vars */ <unfinished ...> [pid 137379] <... execve resumed>) = 0 [pid 137380] <... execve resumed>) = 0 [pid 137381] execve("/usr/bin/awk", ["awk", "NR == 1 {print }"], 0x55d437568350 /* 17 vars */) = 0 [pid 137379] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096 [pid 137379] write(1, " 30960 Jan 20 2022 colrm\n-r"..., 4096) = 4096 [pid 137379] write(1, "-xr-x 1 root root 1866 De"..., 4096) = 4096 [pid 137379] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137379] write(1, " 1 root root 315136 Oct 2 "..., 4096) = 4096 [pid 137379] write(1, "ewgrp\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137379] write(1, "-rwxr-xr-x 1 root root 43"..., 4096) = 4096 [pid 137379] write(1, "eb 27 2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096 [pid 137379] write(1, "672 Jun 18 2023 systemd-ask-pas"..., 4096) = 4096 [pid 137379] write(1, "zselect\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137379] write(1, "r-xr-x 1 root root 2081 A"..., 4096) = 4096 [pid 137379] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533 [pid 137379] +++ exited with 0 +++ [pid 137380] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 4096) = 4096 [pid 137380] write(1, "xrwx 1 root root 12 Jul"..., 4096) = 4096 [pid 137380] write(1, "Feb 2 2021 ping6 -> ping\nlrwxr"..., 4096) = 4096 [pid 137380] write(1, "t 10803 Sep 24 2021 pod2t"..., 4096) = 4096 [pid 137380] write(1, "22 colcrt\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137380] write(1, " 2022 zmore\n-rwxr-xr-x 1 root r"..., 4096) = 4096 [pid 137380] write(1, "6 Jun 19 08:47 TS\n-rwxr-xr-x 1 r"..., 4096) = 4096 [pid 137380] write(1, "oot 35048 Jan 20 2022 tas"..., 4096) = 4096 [pid 137380] write(1, " 4392 Sep 24 2021 ptargrep\n"..., 4096) = 4096 [pid 137380] write(1, "-xr-x 1 root root 56192 Se"..., 4096) = 4096 [pid 137380] write(1, " 7285 Nov 24 2016 boot-eos\n-"..., 4096) = 4096 [pid 137380] write(1, "ostname-config.sh\n-rwxr-xr-x 2 r"..., 533) = 533 [pid 137381] write(1, "\n", 1) = 1 [pid 137380] +++ exited with 0 +++ [pid 137381] +++ exited with 0 +++ --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137379, si_uid=1000, si_status=0, si_utime=1, si_stime=5} --- +++ exited with 0 +++ I created 30 dummy archive files in a dev environment and then ran the modified script to verify that the OLDEST_ARCHIVE_FILE is the archive file that was created first: The setup: admin@sonic:~/testpipe/var/log$ pwd /home/admin/testpipe/var/log admin@sonic:~/testpipe/var/log$ ls -alrth total 8.0K drwxr-xr-x 3 admin admin 4.0K Jul 25 17:12 .. -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.1.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.2.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.3.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.5.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.4.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.6.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.7.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.9.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.8.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.10.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.11.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.12.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.13.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.14.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.15.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.16.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.17.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.18.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.19.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.20.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.21.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.22.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.23.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.24.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.25.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.26.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.27.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.28.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.29.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.30.gz drwxr-xr-x 2 admin admin 4.0K Jul 25 17:17 . The script: admin@sonic:~/testpipe$ cat sort_test.sh #!/usr/bin/bash SORTED_ARCHIVE_FILES=$(find ./var/log -type f -printf '%T+ %p\n' | grep -E '.+\.[0-9]+(\.gz)?$' | sort ) OLDEST_ARCHIVE_FILE=$(echo "${SORTED_ARCHIVE_FILES}" | awk 'NR == 1 { print $2; }') echo $OLDEST_ARCHIVE_FILE The output: admin@sonic:~/testpipe$ ./sort_test.sh ./var/log/syslog.1.gz
mssonicbld
pushed a commit
to mssonicbld/sonic-buildimage
that referenced
this pull request
Aug 2, 2024
…issues (sonic-net#19699) Why I did it Currently the logrotate script is susceptible to SIGPIPE issue because of a race condition between the sort and head commands that leads to the following error: Jun 22 03:20:00.331117 sonic-switch INFO logrotate[1494655]: sort: write failed: 'standard output': Broken pipe Jun 22 03:20:00.331207 sonic-switch INFO logrotate[1494655]: sort: write error This is because if the sort command has not yet finished writing to stdout and head has finished executing and closes the pipe. Subsequent writes by the previously issued sort command would fail because of the closed pipe. Work item tracking Microsoft ADO (number only): 28511764 How I did it I refactored the code in a way that removes the call to head after sort. How to verify it I created a bash script that induces a write failed: Broken pipe error on account of a sort | head combination and verified the SIGPIPE error code. I then removed the head command and replaced it with a call to awk and re-ran the script. This time there was no SIGPIPE. The script: #!/usr/bin/bash var0=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | head -n 20 > /dev/null') echo "" echo "Removed head and modified awk:" echo "" var1=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | awk "NR == 1 {print $2}"') The output execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7ffc34b52f98 /* 17 vars */) = 0 strace: Process 137372 attached strace: Process 137373 attached strace: Process 137374 attached [pid 137373] execve("/usr/bin/sort", ["sort"], 0x55be58b00350 /* 17 vars */ <unfinished ...> [pid 137372] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55be58b00350 /* 17 vars */ <unfinished ...> [pid 137373] <... execve resumed>) = 0 [pid 137372] <... execve resumed>) = 0 [pid 137374] execve("/usr/bin/head", ["head", "-n", "20"], 0x55be58b00350 /* 17 vars */) = 0 [pid 137372] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096 [pid 137372] write(1, " 30960 Jan 20 2022 colrm\n-r"..., 4096) = 4096 [pid 137372] write(1, "-xr-x 1 root root 1866 De"..., 4096) = 4096 [pid 137372] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137372] write(1, " 1 root root 315136 Oct 2 "..., 4096) = 4096 [pid 137372] write(1, "ewgrp\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137372] write(1, "-rwxr-xr-x 1 root root 43"..., 4096) = 4096 [pid 137372] write(1, "eb 27 2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096 [pid 137372] write(1, "672 Jun 18 2023 systemd-ask-pas"..., 4096) = 4096 [pid 137372] write(1, "zselect\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137372] write(1, "r-xr-x 1 root root 2081 A"..., 4096) = 4096 [pid 137372] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533 [pid 137372] +++ exited with 0 +++ [pid 137373] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 4096) = 4096 [pid 137373] write(1, "xrwx 1 root root 12 Jul"..., 4096) = 4096 [pid 137374] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 942 <unfinished ...> [pid 137373] write(1, "Feb 2 2021 ping6 -> ping\nlrwxr"..., 4096 <unfinished ...> [pid 137374] <... write resumed>) = 942 [pid 137373] <... write resumed>) = -1 EPIPE (Broken pipe) [pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=137373, si_uid=1000} --- [pid 137374] +++ exited with 0 +++ [pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_TKILL, si_pid=137373, si_uid=1000} --- [pid 137373] +++ killed by SIGPIPE +++ --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137372, si_uid=1000, si_status=0, si_utime=0, si_stime=4} --- +++ exited with 0 +++ Removed head and modified awk: execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7fff94a24d18 /* 17 vars */) = 0 strace: Process 137379 attached strace: Process 137380 attached strace: Process 137381 attached [pid 137379] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55d437568350 /* 17 vars */ <unfinished ...> [pid 137380] execve("/usr/bin/sort", ["sort"], 0x55d437568350 /* 17 vars */ <unfinished ...> [pid 137379] <... execve resumed>) = 0 [pid 137380] <... execve resumed>) = 0 [pid 137381] execve("/usr/bin/awk", ["awk", "NR == 1 {print }"], 0x55d437568350 /* 17 vars */) = 0 [pid 137379] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096 [pid 137379] write(1, " 30960 Jan 20 2022 colrm\n-r"..., 4096) = 4096 [pid 137379] write(1, "-xr-x 1 root root 1866 De"..., 4096) = 4096 [pid 137379] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137379] write(1, " 1 root root 315136 Oct 2 "..., 4096) = 4096 [pid 137379] write(1, "ewgrp\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137379] write(1, "-rwxr-xr-x 1 root root 43"..., 4096) = 4096 [pid 137379] write(1, "eb 27 2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096 [pid 137379] write(1, "672 Jun 18 2023 systemd-ask-pas"..., 4096) = 4096 [pid 137379] write(1, "zselect\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137379] write(1, "r-xr-x 1 root root 2081 A"..., 4096) = 4096 [pid 137379] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533 [pid 137379] +++ exited with 0 +++ [pid 137380] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 4096) = 4096 [pid 137380] write(1, "xrwx 1 root root 12 Jul"..., 4096) = 4096 [pid 137380] write(1, "Feb 2 2021 ping6 -> ping\nlrwxr"..., 4096) = 4096 [pid 137380] write(1, "t 10803 Sep 24 2021 pod2t"..., 4096) = 4096 [pid 137380] write(1, "22 colcrt\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137380] write(1, " 2022 zmore\n-rwxr-xr-x 1 root r"..., 4096) = 4096 [pid 137380] write(1, "6 Jun 19 08:47 TS\n-rwxr-xr-x 1 r"..., 4096) = 4096 [pid 137380] write(1, "oot 35048 Jan 20 2022 tas"..., 4096) = 4096 [pid 137380] write(1, " 4392 Sep 24 2021 ptargrep\n"..., 4096) = 4096 [pid 137380] write(1, "-xr-x 1 root root 56192 Se"..., 4096) = 4096 [pid 137380] write(1, " 7285 Nov 24 2016 boot-eos\n-"..., 4096) = 4096 [pid 137380] write(1, "ostname-config.sh\n-rwxr-xr-x 2 r"..., 533) = 533 [pid 137381] write(1, "\n", 1) = 1 [pid 137380] +++ exited with 0 +++ [pid 137381] +++ exited with 0 +++ --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137379, si_uid=1000, si_status=0, si_utime=1, si_stime=5} --- +++ exited with 0 +++ I created 30 dummy archive files in a dev environment and then ran the modified script to verify that the OLDEST_ARCHIVE_FILE is the archive file that was created first: The setup: admin@sonic:~/testpipe/var/log$ pwd /home/admin/testpipe/var/log admin@sonic:~/testpipe/var/log$ ls -alrth total 8.0K drwxr-xr-x 3 admin admin 4.0K Jul 25 17:12 .. -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.1.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.2.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.3.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.5.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.4.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.6.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.7.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.9.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.8.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.10.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.11.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.12.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.13.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.14.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.15.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.16.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.17.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.18.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.19.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.20.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.21.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.22.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.23.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.24.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.25.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.26.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.27.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.28.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.29.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.30.gz drwxr-xr-x 2 admin admin 4.0K Jul 25 17:17 . The script: admin@sonic:~/testpipe$ cat sort_test.sh #!/usr/bin/bash SORTED_ARCHIVE_FILES=$(find ./var/log -type f -printf '%T+ %p\n' | grep -E '.+\.[0-9]+(\.gz)?$' | sort ) OLDEST_ARCHIVE_FILE=$(echo "${SORTED_ARCHIVE_FILES}" | awk 'NR == 1 { print $2; }') echo $OLDEST_ARCHIVE_FILE The output: admin@sonic:~/testpipe$ ./sort_test.sh ./var/log/syslog.1.gz
Cherry-pick PR to 202311: #19783 |
Cherry-pick PR to 202405: #19784 |
mssonicbld
pushed a commit
to mssonicbld/sonic-buildimage
that referenced
this pull request
Aug 2, 2024
…issues (sonic-net#19699) Why I did it Currently the logrotate script is susceptible to SIGPIPE issue because of a race condition between the sort and head commands that leads to the following error: Jun 22 03:20:00.331117 sonic-switch INFO logrotate[1494655]: sort: write failed: 'standard output': Broken pipe Jun 22 03:20:00.331207 sonic-switch INFO logrotate[1494655]: sort: write error This is because if the sort command has not yet finished writing to stdout and head has finished executing and closes the pipe. Subsequent writes by the previously issued sort command would fail because of the closed pipe. Work item tracking Microsoft ADO (number only): 28511764 How I did it I refactored the code in a way that removes the call to head after sort. How to verify it I created a bash script that induces a write failed: Broken pipe error on account of a sort | head combination and verified the SIGPIPE error code. I then removed the head command and replaced it with a call to awk and re-ran the script. This time there was no SIGPIPE. The script: #!/usr/bin/bash var0=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | head -n 20 > /dev/null') echo "" echo "Removed head and modified awk:" echo "" var1=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | awk "NR == 1 {print $2}"') The output execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7ffc34b52f98 /* 17 vars */) = 0 strace: Process 137372 attached strace: Process 137373 attached strace: Process 137374 attached [pid 137373] execve("/usr/bin/sort", ["sort"], 0x55be58b00350 /* 17 vars */ <unfinished ...> [pid 137372] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55be58b00350 /* 17 vars */ <unfinished ...> [pid 137373] <... execve resumed>) = 0 [pid 137372] <... execve resumed>) = 0 [pid 137374] execve("/usr/bin/head", ["head", "-n", "20"], 0x55be58b00350 /* 17 vars */) = 0 [pid 137372] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096 [pid 137372] write(1, " 30960 Jan 20 2022 colrm\n-r"..., 4096) = 4096 [pid 137372] write(1, "-xr-x 1 root root 1866 De"..., 4096) = 4096 [pid 137372] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137372] write(1, " 1 root root 315136 Oct 2 "..., 4096) = 4096 [pid 137372] write(1, "ewgrp\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137372] write(1, "-rwxr-xr-x 1 root root 43"..., 4096) = 4096 [pid 137372] write(1, "eb 27 2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096 [pid 137372] write(1, "672 Jun 18 2023 systemd-ask-pas"..., 4096) = 4096 [pid 137372] write(1, "zselect\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137372] write(1, "r-xr-x 1 root root 2081 A"..., 4096) = 4096 [pid 137372] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533 [pid 137372] +++ exited with 0 +++ [pid 137373] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 4096) = 4096 [pid 137373] write(1, "xrwx 1 root root 12 Jul"..., 4096) = 4096 [pid 137374] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 942 <unfinished ...> [pid 137373] write(1, "Feb 2 2021 ping6 -> ping\nlrwxr"..., 4096 <unfinished ...> [pid 137374] <... write resumed>) = 942 [pid 137373] <... write resumed>) = -1 EPIPE (Broken pipe) [pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=137373, si_uid=1000} --- [pid 137374] +++ exited with 0 +++ [pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_TKILL, si_pid=137373, si_uid=1000} --- [pid 137373] +++ killed by SIGPIPE +++ --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137372, si_uid=1000, si_status=0, si_utime=0, si_stime=4} --- +++ exited with 0 +++ Removed head and modified awk: execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7fff94a24d18 /* 17 vars */) = 0 strace: Process 137379 attached strace: Process 137380 attached strace: Process 137381 attached [pid 137379] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55d437568350 /* 17 vars */ <unfinished ...> [pid 137380] execve("/usr/bin/sort", ["sort"], 0x55d437568350 /* 17 vars */ <unfinished ...> [pid 137379] <... execve resumed>) = 0 [pid 137380] <... execve resumed>) = 0 [pid 137381] execve("/usr/bin/awk", ["awk", "NR == 1 {print }"], 0x55d437568350 /* 17 vars */) = 0 [pid 137379] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096 [pid 137379] write(1, " 30960 Jan 20 2022 colrm\n-r"..., 4096) = 4096 [pid 137379] write(1, "-xr-x 1 root root 1866 De"..., 4096) = 4096 [pid 137379] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137379] write(1, " 1 root root 315136 Oct 2 "..., 4096) = 4096 [pid 137379] write(1, "ewgrp\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137379] write(1, "-rwxr-xr-x 1 root root 43"..., 4096) = 4096 [pid 137379] write(1, "eb 27 2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096 [pid 137379] write(1, "672 Jun 18 2023 systemd-ask-pas"..., 4096) = 4096 [pid 137379] write(1, "zselect\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137379] write(1, "r-xr-x 1 root root 2081 A"..., 4096) = 4096 [pid 137379] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533 [pid 137379] +++ exited with 0 +++ [pid 137380] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 4096) = 4096 [pid 137380] write(1, "xrwx 1 root root 12 Jul"..., 4096) = 4096 [pid 137380] write(1, "Feb 2 2021 ping6 -> ping\nlrwxr"..., 4096) = 4096 [pid 137380] write(1, "t 10803 Sep 24 2021 pod2t"..., 4096) = 4096 [pid 137380] write(1, "22 colcrt\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137380] write(1, " 2022 zmore\n-rwxr-xr-x 1 root r"..., 4096) = 4096 [pid 137380] write(1, "6 Jun 19 08:47 TS\n-rwxr-xr-x 1 r"..., 4096) = 4096 [pid 137380] write(1, "oot 35048 Jan 20 2022 tas"..., 4096) = 4096 [pid 137380] write(1, " 4392 Sep 24 2021 ptargrep\n"..., 4096) = 4096 [pid 137380] write(1, "-xr-x 1 root root 56192 Se"..., 4096) = 4096 [pid 137380] write(1, " 7285 Nov 24 2016 boot-eos\n-"..., 4096) = 4096 [pid 137380] write(1, "ostname-config.sh\n-rwxr-xr-x 2 r"..., 533) = 533 [pid 137381] write(1, "\n", 1) = 1 [pid 137380] +++ exited with 0 +++ [pid 137381] +++ exited with 0 +++ --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137379, si_uid=1000, si_status=0, si_utime=1, si_stime=5} --- +++ exited with 0 +++ I created 30 dummy archive files in a dev environment and then ran the modified script to verify that the OLDEST_ARCHIVE_FILE is the archive file that was created first: The setup: admin@sonic:~/testpipe/var/log$ pwd /home/admin/testpipe/var/log admin@sonic:~/testpipe/var/log$ ls -alrth total 8.0K drwxr-xr-x 3 admin admin 4.0K Jul 25 17:12 .. -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.1.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.2.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.3.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.5.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.4.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.6.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.7.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.9.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.8.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.10.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.11.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.12.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.13.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.14.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.15.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.16.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.17.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.18.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.19.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.20.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.21.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.22.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.23.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.24.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.25.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.26.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.27.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.28.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.29.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.30.gz drwxr-xr-x 2 admin admin 4.0K Jul 25 17:17 . The script: admin@sonic:~/testpipe$ cat sort_test.sh #!/usr/bin/bash SORTED_ARCHIVE_FILES=$(find ./var/log -type f -printf '%T+ %p\n' | grep -E '.+\.[0-9]+(\.gz)?$' | sort ) OLDEST_ARCHIVE_FILE=$(echo "${SORTED_ARCHIVE_FILES}" | awk 'NR == 1 { print $2; }') echo $OLDEST_ARCHIVE_FILE The output: admin@sonic:~/testpipe$ ./sort_test.sh ./var/log/syslog.1.gz
mssonicbld
pushed a commit
that referenced
this pull request
Aug 3, 2024
…issues (#19699) Why I did it Currently the logrotate script is susceptible to SIGPIPE issue because of a race condition between the sort and head commands that leads to the following error: Jun 22 03:20:00.331117 sonic-switch INFO logrotate[1494655]: sort: write failed: 'standard output': Broken pipe Jun 22 03:20:00.331207 sonic-switch INFO logrotate[1494655]: sort: write error This is because if the sort command has not yet finished writing to stdout and head has finished executing and closes the pipe. Subsequent writes by the previously issued sort command would fail because of the closed pipe. Work item tracking Microsoft ADO (number only): 28511764 How I did it I refactored the code in a way that removes the call to head after sort. How to verify it I created a bash script that induces a write failed: Broken pipe error on account of a sort | head combination and verified the SIGPIPE error code. I then removed the head command and replaced it with a call to awk and re-ran the script. This time there was no SIGPIPE. The script: #!/usr/bin/bash var0=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | head -n 20 > /dev/null') echo "" echo "Removed head and modified awk:" echo "" var1=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | awk "NR == 1 {print $2}"') The output execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7ffc34b52f98 /* 17 vars */) = 0 strace: Process 137372 attached strace: Process 137373 attached strace: Process 137374 attached [pid 137373] execve("/usr/bin/sort", ["sort"], 0x55be58b00350 /* 17 vars */ <unfinished ...> [pid 137372] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55be58b00350 /* 17 vars */ <unfinished ...> [pid 137373] <... execve resumed>) = 0 [pid 137372] <... execve resumed>) = 0 [pid 137374] execve("/usr/bin/head", ["head", "-n", "20"], 0x55be58b00350 /* 17 vars */) = 0 [pid 137372] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096 [pid 137372] write(1, " 30960 Jan 20 2022 colrm\n-r"..., 4096) = 4096 [pid 137372] write(1, "-xr-x 1 root root 1866 De"..., 4096) = 4096 [pid 137372] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137372] write(1, " 1 root root 315136 Oct 2 "..., 4096) = 4096 [pid 137372] write(1, "ewgrp\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137372] write(1, "-rwxr-xr-x 1 root root 43"..., 4096) = 4096 [pid 137372] write(1, "eb 27 2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096 [pid 137372] write(1, "672 Jun 18 2023 systemd-ask-pas"..., 4096) = 4096 [pid 137372] write(1, "zselect\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137372] write(1, "r-xr-x 1 root root 2081 A"..., 4096) = 4096 [pid 137372] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533 [pid 137372] +++ exited with 0 +++ [pid 137373] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 4096) = 4096 [pid 137373] write(1, "xrwx 1 root root 12 Jul"..., 4096) = 4096 [pid 137374] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 942 <unfinished ...> [pid 137373] write(1, "Feb 2 2021 ping6 -> ping\nlrwxr"..., 4096 <unfinished ...> [pid 137374] <... write resumed>) = 942 [pid 137373] <... write resumed>) = -1 EPIPE (Broken pipe) [pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=137373, si_uid=1000} --- [pid 137374] +++ exited with 0 +++ [pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_TKILL, si_pid=137373, si_uid=1000} --- [pid 137373] +++ killed by SIGPIPE +++ --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137372, si_uid=1000, si_status=0, si_utime=0, si_stime=4} --- +++ exited with 0 +++ Removed head and modified awk: execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7fff94a24d18 /* 17 vars */) = 0 strace: Process 137379 attached strace: Process 137380 attached strace: Process 137381 attached [pid 137379] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55d437568350 /* 17 vars */ <unfinished ...> [pid 137380] execve("/usr/bin/sort", ["sort"], 0x55d437568350 /* 17 vars */ <unfinished ...> [pid 137379] <... execve resumed>) = 0 [pid 137380] <... execve resumed>) = 0 [pid 137381] execve("/usr/bin/awk", ["awk", "NR == 1 {print }"], 0x55d437568350 /* 17 vars */) = 0 [pid 137379] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096 [pid 137379] write(1, " 30960 Jan 20 2022 colrm\n-r"..., 4096) = 4096 [pid 137379] write(1, "-xr-x 1 root root 1866 De"..., 4096) = 4096 [pid 137379] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137379] write(1, " 1 root root 315136 Oct 2 "..., 4096) = 4096 [pid 137379] write(1, "ewgrp\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137379] write(1, "-rwxr-xr-x 1 root root 43"..., 4096) = 4096 [pid 137379] write(1, "eb 27 2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096 [pid 137379] write(1, "672 Jun 18 2023 systemd-ask-pas"..., 4096) = 4096 [pid 137379] write(1, "zselect\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137379] write(1, "r-xr-x 1 root root 2081 A"..., 4096) = 4096 [pid 137379] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533 [pid 137379] +++ exited with 0 +++ [pid 137380] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 4096) = 4096 [pid 137380] write(1, "xrwx 1 root root 12 Jul"..., 4096) = 4096 [pid 137380] write(1, "Feb 2 2021 ping6 -> ping\nlrwxr"..., 4096) = 4096 [pid 137380] write(1, "t 10803 Sep 24 2021 pod2t"..., 4096) = 4096 [pid 137380] write(1, "22 colcrt\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137380] write(1, " 2022 zmore\n-rwxr-xr-x 1 root r"..., 4096) = 4096 [pid 137380] write(1, "6 Jun 19 08:47 TS\n-rwxr-xr-x 1 r"..., 4096) = 4096 [pid 137380] write(1, "oot 35048 Jan 20 2022 tas"..., 4096) = 4096 [pid 137380] write(1, " 4392 Sep 24 2021 ptargrep\n"..., 4096) = 4096 [pid 137380] write(1, "-xr-x 1 root root 56192 Se"..., 4096) = 4096 [pid 137380] write(1, " 7285 Nov 24 2016 boot-eos\n-"..., 4096) = 4096 [pid 137380] write(1, "ostname-config.sh\n-rwxr-xr-x 2 r"..., 533) = 533 [pid 137381] write(1, "\n", 1) = 1 [pid 137380] +++ exited with 0 +++ [pid 137381] +++ exited with 0 +++ --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137379, si_uid=1000, si_status=0, si_utime=1, si_stime=5} --- +++ exited with 0 +++ I created 30 dummy archive files in a dev environment and then ran the modified script to verify that the OLDEST_ARCHIVE_FILE is the archive file that was created first: The setup: admin@sonic:~/testpipe/var/log$ pwd /home/admin/testpipe/var/log admin@sonic:~/testpipe/var/log$ ls -alrth total 8.0K drwxr-xr-x 3 admin admin 4.0K Jul 25 17:12 .. -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.1.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.2.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.3.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.5.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.4.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.6.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.7.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.9.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.8.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.10.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.11.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.12.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.13.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.14.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.15.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.16.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.17.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.18.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.19.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.20.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.21.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.22.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.23.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.24.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.25.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.26.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.27.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.28.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.29.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.30.gz drwxr-xr-x 2 admin admin 4.0K Jul 25 17:17 . The script: admin@sonic:~/testpipe$ cat sort_test.sh #!/usr/bin/bash SORTED_ARCHIVE_FILES=$(find ./var/log -type f -printf '%T+ %p\n' | grep -E '.+\.[0-9]+(\.gz)?$' | sort ) OLDEST_ARCHIVE_FILE=$(echo "${SORTED_ARCHIVE_FILES}" | awk 'NR == 1 { print $2; }') echo $OLDEST_ARCHIVE_FILE The output: admin@sonic:~/testpipe$ ./sort_test.sh ./var/log/syslog.1.gz
mssonicbld
pushed a commit
that referenced
this pull request
Aug 24, 2024
…issues (#19699) Why I did it Currently the logrotate script is susceptible to SIGPIPE issue because of a race condition between the sort and head commands that leads to the following error: Jun 22 03:20:00.331117 sonic-switch INFO logrotate[1494655]: sort: write failed: 'standard output': Broken pipe Jun 22 03:20:00.331207 sonic-switch INFO logrotate[1494655]: sort: write error This is because if the sort command has not yet finished writing to stdout and head has finished executing and closes the pipe. Subsequent writes by the previously issued sort command would fail because of the closed pipe. Work item tracking Microsoft ADO (number only): 28511764 How I did it I refactored the code in a way that removes the call to head after sort. How to verify it I created a bash script that induces a write failed: Broken pipe error on account of a sort | head combination and verified the SIGPIPE error code. I then removed the head command and replaced it with a call to awk and re-ran the script. This time there was no SIGPIPE. The script: #!/usr/bin/bash var0=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | head -n 20 > /dev/null') echo "" echo "Removed head and modified awk:" echo "" var1=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | awk "NR == 1 {print $2}"') The output execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7ffc34b52f98 /* 17 vars */) = 0 strace: Process 137372 attached strace: Process 137373 attached strace: Process 137374 attached [pid 137373] execve("/usr/bin/sort", ["sort"], 0x55be58b00350 /* 17 vars */ <unfinished ...> [pid 137372] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55be58b00350 /* 17 vars */ <unfinished ...> [pid 137373] <... execve resumed>) = 0 [pid 137372] <... execve resumed>) = 0 [pid 137374] execve("/usr/bin/head", ["head", "-n", "20"], 0x55be58b00350 /* 17 vars */) = 0 [pid 137372] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096 [pid 137372] write(1, " 30960 Jan 20 2022 colrm\n-r"..., 4096) = 4096 [pid 137372] write(1, "-xr-x 1 root root 1866 De"..., 4096) = 4096 [pid 137372] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137372] write(1, " 1 root root 315136 Oct 2 "..., 4096) = 4096 [pid 137372] write(1, "ewgrp\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137372] write(1, "-rwxr-xr-x 1 root root 43"..., 4096) = 4096 [pid 137372] write(1, "eb 27 2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096 [pid 137372] write(1, "672 Jun 18 2023 systemd-ask-pas"..., 4096) = 4096 [pid 137372] write(1, "zselect\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137372] write(1, "r-xr-x 1 root root 2081 A"..., 4096) = 4096 [pid 137372] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533 [pid 137372] +++ exited with 0 +++ [pid 137373] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 4096) = 4096 [pid 137373] write(1, "xrwx 1 root root 12 Jul"..., 4096) = 4096 [pid 137374] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 942 <unfinished ...> [pid 137373] write(1, "Feb 2 2021 ping6 -> ping\nlrwxr"..., 4096 <unfinished ...> [pid 137374] <... write resumed>) = 942 [pid 137373] <... write resumed>) = -1 EPIPE (Broken pipe) [pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=137373, si_uid=1000} --- [pid 137374] +++ exited with 0 +++ [pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_TKILL, si_pid=137373, si_uid=1000} --- [pid 137373] +++ killed by SIGPIPE +++ --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137372, si_uid=1000, si_status=0, si_utime=0, si_stime=4} --- +++ exited with 0 +++ Removed head and modified awk: execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7fff94a24d18 /* 17 vars */) = 0 strace: Process 137379 attached strace: Process 137380 attached strace: Process 137381 attached [pid 137379] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55d437568350 /* 17 vars */ <unfinished ...> [pid 137380] execve("/usr/bin/sort", ["sort"], 0x55d437568350 /* 17 vars */ <unfinished ...> [pid 137379] <... execve resumed>) = 0 [pid 137380] <... execve resumed>) = 0 [pid 137381] execve("/usr/bin/awk", ["awk", "NR == 1 {print }"], 0x55d437568350 /* 17 vars */) = 0 [pid 137379] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096 [pid 137379] write(1, " 30960 Jan 20 2022 colrm\n-r"..., 4096) = 4096 [pid 137379] write(1, "-xr-x 1 root root 1866 De"..., 4096) = 4096 [pid 137379] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137379] write(1, " 1 root root 315136 Oct 2 "..., 4096) = 4096 [pid 137379] write(1, "ewgrp\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137379] write(1, "-rwxr-xr-x 1 root root 43"..., 4096) = 4096 [pid 137379] write(1, "eb 27 2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096 [pid 137379] write(1, "672 Jun 18 2023 systemd-ask-pas"..., 4096) = 4096 [pid 137379] write(1, "zselect\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137379] write(1, "r-xr-x 1 root root 2081 A"..., 4096) = 4096 [pid 137379] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533 [pid 137379] +++ exited with 0 +++ [pid 137380] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 4096) = 4096 [pid 137380] write(1, "xrwx 1 root root 12 Jul"..., 4096) = 4096 [pid 137380] write(1, "Feb 2 2021 ping6 -> ping\nlrwxr"..., 4096) = 4096 [pid 137380] write(1, "t 10803 Sep 24 2021 pod2t"..., 4096) = 4096 [pid 137380] write(1, "22 colcrt\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137380] write(1, " 2022 zmore\n-rwxr-xr-x 1 root r"..., 4096) = 4096 [pid 137380] write(1, "6 Jun 19 08:47 TS\n-rwxr-xr-x 1 r"..., 4096) = 4096 [pid 137380] write(1, "oot 35048 Jan 20 2022 tas"..., 4096) = 4096 [pid 137380] write(1, " 4392 Sep 24 2021 ptargrep\n"..., 4096) = 4096 [pid 137380] write(1, "-xr-x 1 root root 56192 Se"..., 4096) = 4096 [pid 137380] write(1, " 7285 Nov 24 2016 boot-eos\n-"..., 4096) = 4096 [pid 137380] write(1, "ostname-config.sh\n-rwxr-xr-x 2 r"..., 533) = 533 [pid 137381] write(1, "\n", 1) = 1 [pid 137380] +++ exited with 0 +++ [pid 137381] +++ exited with 0 +++ --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137379, si_uid=1000, si_status=0, si_utime=1, si_stime=5} --- +++ exited with 0 +++ I created 30 dummy archive files in a dev environment and then ran the modified script to verify that the OLDEST_ARCHIVE_FILE is the archive file that was created first: The setup: admin@sonic:~/testpipe/var/log$ pwd /home/admin/testpipe/var/log admin@sonic:~/testpipe/var/log$ ls -alrth total 8.0K drwxr-xr-x 3 admin admin 4.0K Jul 25 17:12 .. -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.1.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.2.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.3.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.5.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.4.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.6.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.7.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.9.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.8.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.10.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.11.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.12.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.13.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.14.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.15.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.16.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.17.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.18.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.19.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.20.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.21.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.22.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.23.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.24.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.25.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.26.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.27.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.28.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.29.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.30.gz drwxr-xr-x 2 admin admin 4.0K Jul 25 17:17 . The script: admin@sonic:~/testpipe$ cat sort_test.sh #!/usr/bin/bash SORTED_ARCHIVE_FILES=$(find ./var/log -type f -printf '%T+ %p\n' | grep -E '.+\.[0-9]+(\.gz)?$' | sort ) OLDEST_ARCHIVE_FILE=$(echo "${SORTED_ARCHIVE_FILES}" | awk 'NR == 1 { print $2; }') echo $OLDEST_ARCHIVE_FILE The output: admin@sonic:~/testpipe$ ./sort_test.sh ./var/log/syslog.1.gz
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Approved for 202311 Branch
Approved for 202405 Branch
Chassis for 202205 branch
PRs needed for 202205 branch in msft repo
Included in Chassis for 202205 Branch
Indicate PR is already in MSFT repo 202205 branch
Included in 202311 Branch
Included in 202405 Branch
Request for 202305 Branch
Request for 202311 Branch
Request for 202405 Branch
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why I did it
Currently the logrotate script is susceptible to SIGPIPE issue because of a race condition between the
sort
andhead
commands that leads to the following error:This is because if the
sort
command has not yet finished writing tostdout
andhead
has finished executing and closes the pipe. Subsequent writes by the previously issuedsort
command would fail because of the closed pipe.Work item tracking
How I did it
I refactored the code in a way that removes the call to
head
aftersort
.How to verify it
write failed: Broken pipe
error on account of asort | head
combination and verified the SIGPIPE error code. I then removed thehead
command and replaced it with a call toawk
and re-ran the script. This time there was no SIGPIPE.The script:
The output
The setup:
The script:
The output:
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)