Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separated sort and head commands to prevent broken pipe issues #19699

Merged
merged 3 commits into from
Jul 26, 2024

Conversation

assrinivasan
Copy link
Contributor

@assrinivasan assrinivasan commented Jul 25, 2024

Why I did it

Currently the logrotate script is susceptible to SIGPIPE issue because of a race condition between the sort and head commands that leads to the following error:

Jun 22 03:20:00.331117 sonic-switch INFO logrotate[1494655]: sort: write failed: 'standard output': Broken pipe
Jun 22 03:20:00.331207 sonic-switch INFO logrotate[1494655]: sort: write error

This is because if the sort command has not yet finished writing to stdout and head has finished executing and closes the pipe. Subsequent writes by the previously issued sort command would fail because of the closed pipe.

Work item tracking
  • Microsoft ADO (number only): 28511764

How I did it

I refactored the code in a way that removes the call to head after sort.

How to verify it

  1. I created a bash script that induces a write failed: Broken pipe error on account of a sort | head combination and verified the SIGPIPE error code. I then removed the head command and replaced it with a call to awk and re-ran the script. This time there was no SIGPIPE.

The script:

#!/usr/bin/bash

var0=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | head -n 20 > /dev/null')

echo ""
echo "Removed head and modified awk:"
echo ""

var1=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | awk "NR == 1 {print $2}"')

The output

execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7ffc34b52f98 /* 17 vars */) = 0
strace: Process 137372 attached
strace: Process 137373 attached
strace: Process 137374 attached
[pid 137373] execve("/usr/bin/sort", ["sort"], 0x55be58b00350 /* 17 vars */ <unfinished ...>
[pid 137372] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55be58b00350 /* 17 vars */ <unfinished ...>
[pid 137373] <... execve resumed>)      = 0
[pid 137372] <... execve resumed>)      = 0
[pid 137374] execve("/usr/bin/head", ["head", "-n", "20"], 0x55be58b00350 /* 17 vars */) = 0
[pid 137372] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096
[pid 137372] write(1, "     30960 Jan 20  2022 colrm\n-r"..., 4096) = 4096
[pid 137372] write(1, "-xr-x 1 root root        1866 De"..., 4096) = 4096
[pid 137372] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137372] write(1, " 1 root root      315136 Oct  2 "..., 4096) = 4096
[pid 137372] write(1, "ewgrp\n-rwxr-xr-x 1 root root    "..., 4096) = 4096
[pid 137372] write(1, "-rwxr-xr-x 1 root root        43"..., 4096) = 4096
[pid 137372] write(1, "eb 27  2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096
[pid 137372] write(1, "672 Jun 18  2023 systemd-ask-pas"..., 4096) = 4096
[pid 137372] write(1, "zselect\n-rwxr-xr-x 1 root root  "..., 4096) = 4096
[pid 137372] write(1, "r-xr-x 1 root root        2081 A"..., 4096) = 4096
[pid 137372] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533
[pid 137372] +++ exited with 0 +++
[pid 137373] write(1, "\n\ndrwx------  2 root root 4096 J"..., 4096) = 4096
[pid 137373] write(1, "xrwx 1 root root          12 Jul"..., 4096) = 4096
[pid 137374] write(1, "\n\ndrwx------  2 root root 4096 J"..., 942 <unfinished ...>
[pid 137373] write(1, "Feb  2  2021 ping6 -> ping\nlrwxr"..., 4096 <unfinished ...>
[pid 137374] <... write resumed>)       = 942
[pid 137373] <... write resumed>)       = -1 EPIPE (Broken pipe)
[pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=137373, si_uid=1000} ---
[pid 137374] +++ exited with 0 +++
[pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_TKILL, si_pid=137373, si_uid=1000} ---
[pid 137373] +++ killed by SIGPIPE +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137372, si_uid=1000, si_status=0, si_utime=0, si_stime=4} ---
+++ exited with 0 +++

Removed head and modified awk:

execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7fff94a24d18 /* 17 vars */) = 0
strace: Process 137379 attached
strace: Process 137380 attached
strace: Process 137381 attached
[pid 137379] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55d437568350 /* 17 vars */ <unfinished ...>
[pid 137380] execve("/usr/bin/sort", ["sort"], 0x55d437568350 /* 17 vars */ <unfinished ...>
[pid 137379] <... execve resumed>)      = 0
[pid 137380] <... execve resumed>)      = 0
[pid 137381] execve("/usr/bin/awk", ["awk", "NR == 1 {print }"], 0x55d437568350 /* 17 vars */) = 0
[pid 137379] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096
[pid 137379] write(1, "     30960 Jan 20  2022 colrm\n-r"..., 4096) = 4096
[pid 137379] write(1, "-xr-x 1 root root        1866 De"..., 4096) = 4096
[pid 137379] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137379] write(1, " 1 root root      315136 Oct  2 "..., 4096) = 4096
[pid 137379] write(1, "ewgrp\n-rwxr-xr-x 1 root root    "..., 4096) = 4096
[pid 137379] write(1, "-rwxr-xr-x 1 root root        43"..., 4096) = 4096
[pid 137379] write(1, "eb 27  2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096
[pid 137379] write(1, "672 Jun 18  2023 systemd-ask-pas"..., 4096) = 4096
[pid 137379] write(1, "zselect\n-rwxr-xr-x 1 root root  "..., 4096) = 4096
[pid 137379] write(1, "r-xr-x 1 root root        2081 A"..., 4096) = 4096
[pid 137379] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533
[pid 137379] +++ exited with 0 +++
[pid 137380] write(1, "\n\ndrwx------  2 root root 4096 J"..., 4096) = 4096
[pid 137380] write(1, "xrwx 1 root root          12 Jul"..., 4096) = 4096
[pid 137380] write(1, "Feb  2  2021 ping6 -> ping\nlrwxr"..., 4096) = 4096
[pid 137380] write(1, "t       10803 Sep 24  2021 pod2t"..., 4096) = 4096
[pid 137380] write(1, "22 colcrt\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137380] write(1, "  2022 zmore\n-rwxr-xr-x 1 root r"..., 4096) = 4096
[pid 137380] write(1, "6 Jun 19 08:47 TS\n-rwxr-xr-x 1 r"..., 4096) = 4096
[pid 137380] write(1, "oot       35048 Jan 20  2022 tas"..., 4096) = 4096
[pid 137380] write(1, "     4392 Sep 24  2021 ptargrep\n"..., 4096) = 4096
[pid 137380] write(1, "-xr-x 1 root root       56192 Se"..., 4096) = 4096
[pid 137380] write(1, "    7285 Nov 24  2016 boot-eos\n-"..., 4096) = 4096
[pid 137380] write(1, "ostname-config.sh\n-rwxr-xr-x 2 r"..., 533) = 533
[pid 137381] write(1, "\n", 1)          = 1
[pid 137380] +++ exited with 0 +++
[pid 137381] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137379, si_uid=1000, si_status=0, si_utime=1, si_stime=5} ---
+++ exited with 0 +++

  1. I created 30 dummy archive files in a dev environment and then ran the modified script to verify that the OLDEST_ARCHIVE_FILE is the archive file that was created first:

The setup:

admin@sonic:~/testpipe/var/log$ pwd
/home/admin/testpipe/var/log

admin@sonic:~/testpipe/var/log$ ls -alrth
total 8.0K
drwxr-xr-x 3 admin admin 4.0K Jul 25 17:12 ..
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.1.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.2.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.3.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.5.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.4.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.6.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.7.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.9.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.8.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.10.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.11.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.12.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.13.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.14.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.15.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.16.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.17.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.18.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.19.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.20.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.21.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.22.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.23.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.24.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.25.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.26.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.27.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.28.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.29.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.30.gz
drwxr-xr-x 2 admin admin 4.0K Jul 25 17:17 .

The script:

admin@sonic:~/testpipe$ cat sort_test.sh
#!/usr/bin/bash

SORTED_ARCHIVE_FILES=$(find ./var/log -type f -printf '%T+ %p\n' | grep -E '.+\.[0-9]+(\.gz)?$' | sort )
OLDEST_ARCHIVE_FILE=$(echo "${SORTED_ARCHIVE_FILES}" | awk 'NR == 1 { print $2; }')
echo $OLDEST_ARCHIVE_FILE



The output:

admin@sonic:~/testpipe$ ./sort_test.sh
./var/log/syslog.1.gz

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

  • internal
  • 20230531.30

@gechiang gechiang added Request for 202405 Branch Chassis for 202205 branch PRs needed for 202205 branch in msft repo labels Jul 26, 2024
@lguohan lguohan merged commit 3e3a1b9 into sonic-net:master Jul 26, 2024
22 checks passed
@gechiang
Copy link
Collaborator

Picked into MSFT repo 202205 branch:
Azure/sonic-buildimage-msft#427

@gechiang gechiang added the Included in Chassis for 202205 Branch Indicate PR is already in MSFT repo 202205 branch label Jul 27, 2024
liushilongbuaa pushed a commit to liushilongbuaa/sonic-buildimage that referenced this pull request Aug 1, 2024
…issues (sonic-net#19699)

Why I did it
Currently the logrotate script is susceptible to SIGPIPE issue because of a race condition between the sort and head commands that leads to the following error:

Jun 22 03:20:00.331117 sonic-switch INFO logrotate[1494655]: sort: write failed: 'standard output': Broken pipe
Jun 22 03:20:00.331207 sonic-switch INFO logrotate[1494655]: sort: write error
This is because if the sort command has not yet finished writing to stdout and head has finished executing and closes the pipe. Subsequent writes by the previously issued sort command would fail because of the closed pipe.

Work item tracking
Microsoft ADO (number only): 28511764
How I did it
I refactored the code in a way that removes the call to head after sort.

How to verify it
I created a bash script that induces a write failed: Broken pipe error on account of a sort | head combination and verified the SIGPIPE error code. I then removed the head command and replaced it with a call to awk and re-ran the script. This time there was no SIGPIPE.
The script:

#!/usr/bin/bash

var0=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | head -n 20 > /dev/null')

echo ""
echo "Removed head and modified awk:"
echo ""

var1=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | awk "NR == 1 {print $2}"')

The output

execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7ffc34b52f98 /* 17 vars */) = 0
strace: Process 137372 attached
strace: Process 137373 attached
strace: Process 137374 attached
[pid 137373] execve("/usr/bin/sort", ["sort"], 0x55be58b00350 /* 17 vars */ <unfinished ...>
[pid 137372] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55be58b00350 /* 17 vars */ <unfinished ...>
[pid 137373] <... execve resumed>)      = 0
[pid 137372] <... execve resumed>)      = 0
[pid 137374] execve("/usr/bin/head", ["head", "-n", "20"], 0x55be58b00350 /* 17 vars */) = 0
[pid 137372] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096
[pid 137372] write(1, "     30960 Jan 20  2022 colrm\n-r"..., 4096) = 4096
[pid 137372] write(1, "-xr-x 1 root root        1866 De"..., 4096) = 4096
[pid 137372] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137372] write(1, " 1 root root      315136 Oct  2 "..., 4096) = 4096
[pid 137372] write(1, "ewgrp\n-rwxr-xr-x 1 root root    "..., 4096) = 4096
[pid 137372] write(1, "-rwxr-xr-x 1 root root        43"..., 4096) = 4096
[pid 137372] write(1, "eb 27  2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096
[pid 137372] write(1, "672 Jun 18  2023 systemd-ask-pas"..., 4096) = 4096
[pid 137372] write(1, "zselect\n-rwxr-xr-x 1 root root  "..., 4096) = 4096
[pid 137372] write(1, "r-xr-x 1 root root        2081 A"..., 4096) = 4096
[pid 137372] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533
[pid 137372] +++ exited with 0 +++
[pid 137373] write(1, "\n\ndrwx------  2 root root 4096 J"..., 4096) = 4096
[pid 137373] write(1, "xrwx 1 root root          12 Jul"..., 4096) = 4096
[pid 137374] write(1, "\n\ndrwx------  2 root root 4096 J"..., 942 <unfinished ...>
[pid 137373] write(1, "Feb  2  2021 ping6 -> ping\nlrwxr"..., 4096 <unfinished ...>
[pid 137374] <... write resumed>)       = 942
[pid 137373] <... write resumed>)       = -1 EPIPE (Broken pipe)
[pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=137373, si_uid=1000} ---
[pid 137374] +++ exited with 0 +++
[pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_TKILL, si_pid=137373, si_uid=1000} ---
[pid 137373] +++ killed by SIGPIPE +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137372, si_uid=1000, si_status=0, si_utime=0, si_stime=4} ---
+++ exited with 0 +++

Removed head and modified awk:

execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7fff94a24d18 /* 17 vars */) = 0
strace: Process 137379 attached
strace: Process 137380 attached
strace: Process 137381 attached
[pid 137379] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55d437568350 /* 17 vars */ <unfinished ...>
[pid 137380] execve("/usr/bin/sort", ["sort"], 0x55d437568350 /* 17 vars */ <unfinished ...>
[pid 137379] <... execve resumed>)      = 0
[pid 137380] <... execve resumed>)      = 0
[pid 137381] execve("/usr/bin/awk", ["awk", "NR == 1 {print }"], 0x55d437568350 /* 17 vars */) = 0
[pid 137379] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096
[pid 137379] write(1, "     30960 Jan 20  2022 colrm\n-r"..., 4096) = 4096
[pid 137379] write(1, "-xr-x 1 root root        1866 De"..., 4096) = 4096
[pid 137379] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137379] write(1, " 1 root root      315136 Oct  2 "..., 4096) = 4096
[pid 137379] write(1, "ewgrp\n-rwxr-xr-x 1 root root    "..., 4096) = 4096
[pid 137379] write(1, "-rwxr-xr-x 1 root root        43"..., 4096) = 4096
[pid 137379] write(1, "eb 27  2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096
[pid 137379] write(1, "672 Jun 18  2023 systemd-ask-pas"..., 4096) = 4096
[pid 137379] write(1, "zselect\n-rwxr-xr-x 1 root root  "..., 4096) = 4096
[pid 137379] write(1, "r-xr-x 1 root root        2081 A"..., 4096) = 4096
[pid 137379] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533
[pid 137379] +++ exited with 0 +++
[pid 137380] write(1, "\n\ndrwx------  2 root root 4096 J"..., 4096) = 4096
[pid 137380] write(1, "xrwx 1 root root          12 Jul"..., 4096) = 4096
[pid 137380] write(1, "Feb  2  2021 ping6 -> ping\nlrwxr"..., 4096) = 4096
[pid 137380] write(1, "t       10803 Sep 24  2021 pod2t"..., 4096) = 4096
[pid 137380] write(1, "22 colcrt\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137380] write(1, "  2022 zmore\n-rwxr-xr-x 1 root r"..., 4096) = 4096
[pid 137380] write(1, "6 Jun 19 08:47 TS\n-rwxr-xr-x 1 r"..., 4096) = 4096
[pid 137380] write(1, "oot       35048 Jan 20  2022 tas"..., 4096) = 4096
[pid 137380] write(1, "     4392 Sep 24  2021 ptargrep\n"..., 4096) = 4096
[pid 137380] write(1, "-xr-x 1 root root       56192 Se"..., 4096) = 4096
[pid 137380] write(1, "    7285 Nov 24  2016 boot-eos\n-"..., 4096) = 4096
[pid 137380] write(1, "ostname-config.sh\n-rwxr-xr-x 2 r"..., 533) = 533
[pid 137381] write(1, "\n", 1)          = 1
[pid 137380] +++ exited with 0 +++
[pid 137381] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137379, si_uid=1000, si_status=0, si_utime=1, si_stime=5} ---
+++ exited with 0 +++

I created 30 dummy archive files in a dev environment and then ran the modified script to verify that the OLDEST_ARCHIVE_FILE is the archive file that was created first:
The setup:

admin@sonic:~/testpipe/var/log$ pwd
/home/admin/testpipe/var/log

admin@sonic:~/testpipe/var/log$ ls -alrth
total 8.0K
drwxr-xr-x 3 admin admin 4.0K Jul 25 17:12 ..
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.1.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.2.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.3.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.5.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.4.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.6.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.7.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.9.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.8.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.10.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.11.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.12.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.13.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.14.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.15.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.16.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.17.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.18.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.19.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.20.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.21.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.22.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.23.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.24.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.25.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.26.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.27.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.28.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.29.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.30.gz
drwxr-xr-x 2 admin admin 4.0K Jul 25 17:17 .

The script:

admin@sonic:~/testpipe$ cat sort_test.sh
#!/usr/bin/bash

SORTED_ARCHIVE_FILES=$(find ./var/log -type f -printf '%T+ %p\n' | grep -E '.+\.[0-9]+(\.gz)?$' | sort )
OLDEST_ARCHIVE_FILE=$(echo "${SORTED_ARCHIVE_FILES}" | awk 'NR == 1 { print $2; }')
echo $OLDEST_ARCHIVE_FILE



The output:

admin@sonic:~/testpipe$ ./sort_test.sh
./var/log/syslog.1.gz
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Aug 2, 2024
…issues (sonic-net#19699)

Why I did it
Currently the logrotate script is susceptible to SIGPIPE issue because of a race condition between the sort and head commands that leads to the following error:

Jun 22 03:20:00.331117 sonic-switch INFO logrotate[1494655]: sort: write failed: 'standard output': Broken pipe
Jun 22 03:20:00.331207 sonic-switch INFO logrotate[1494655]: sort: write error
This is because if the sort command has not yet finished writing to stdout and head has finished executing and closes the pipe. Subsequent writes by the previously issued sort command would fail because of the closed pipe.

Work item tracking
Microsoft ADO (number only): 28511764
How I did it
I refactored the code in a way that removes the call to head after sort.

How to verify it
I created a bash script that induces a write failed: Broken pipe error on account of a sort | head combination and verified the SIGPIPE error code. I then removed the head command and replaced it with a call to awk and re-ran the script. This time there was no SIGPIPE.
The script:

#!/usr/bin/bash

var0=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | head -n 20 > /dev/null')

echo ""
echo "Removed head and modified awk:"
echo ""

var1=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | awk "NR == 1 {print $2}"')

The output

execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7ffc34b52f98 /* 17 vars */) = 0
strace: Process 137372 attached
strace: Process 137373 attached
strace: Process 137374 attached
[pid 137373] execve("/usr/bin/sort", ["sort"], 0x55be58b00350 /* 17 vars */ <unfinished ...>
[pid 137372] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55be58b00350 /* 17 vars */ <unfinished ...>
[pid 137373] <... execve resumed>)      = 0
[pid 137372] <... execve resumed>)      = 0
[pid 137374] execve("/usr/bin/head", ["head", "-n", "20"], 0x55be58b00350 /* 17 vars */) = 0
[pid 137372] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096
[pid 137372] write(1, "     30960 Jan 20  2022 colrm\n-r"..., 4096) = 4096
[pid 137372] write(1, "-xr-x 1 root root        1866 De"..., 4096) = 4096
[pid 137372] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137372] write(1, " 1 root root      315136 Oct  2 "..., 4096) = 4096
[pid 137372] write(1, "ewgrp\n-rwxr-xr-x 1 root root    "..., 4096) = 4096
[pid 137372] write(1, "-rwxr-xr-x 1 root root        43"..., 4096) = 4096
[pid 137372] write(1, "eb 27  2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096
[pid 137372] write(1, "672 Jun 18  2023 systemd-ask-pas"..., 4096) = 4096
[pid 137372] write(1, "zselect\n-rwxr-xr-x 1 root root  "..., 4096) = 4096
[pid 137372] write(1, "r-xr-x 1 root root        2081 A"..., 4096) = 4096
[pid 137372] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533
[pid 137372] +++ exited with 0 +++
[pid 137373] write(1, "\n\ndrwx------  2 root root 4096 J"..., 4096) = 4096
[pid 137373] write(1, "xrwx 1 root root          12 Jul"..., 4096) = 4096
[pid 137374] write(1, "\n\ndrwx------  2 root root 4096 J"..., 942 <unfinished ...>
[pid 137373] write(1, "Feb  2  2021 ping6 -> ping\nlrwxr"..., 4096 <unfinished ...>
[pid 137374] <... write resumed>)       = 942
[pid 137373] <... write resumed>)       = -1 EPIPE (Broken pipe)
[pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=137373, si_uid=1000} ---
[pid 137374] +++ exited with 0 +++
[pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_TKILL, si_pid=137373, si_uid=1000} ---
[pid 137373] +++ killed by SIGPIPE +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137372, si_uid=1000, si_status=0, si_utime=0, si_stime=4} ---
+++ exited with 0 +++

Removed head and modified awk:

execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7fff94a24d18 /* 17 vars */) = 0
strace: Process 137379 attached
strace: Process 137380 attached
strace: Process 137381 attached
[pid 137379] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55d437568350 /* 17 vars */ <unfinished ...>
[pid 137380] execve("/usr/bin/sort", ["sort"], 0x55d437568350 /* 17 vars */ <unfinished ...>
[pid 137379] <... execve resumed>)      = 0
[pid 137380] <... execve resumed>)      = 0
[pid 137381] execve("/usr/bin/awk", ["awk", "NR == 1 {print }"], 0x55d437568350 /* 17 vars */) = 0
[pid 137379] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096
[pid 137379] write(1, "     30960 Jan 20  2022 colrm\n-r"..., 4096) = 4096
[pid 137379] write(1, "-xr-x 1 root root        1866 De"..., 4096) = 4096
[pid 137379] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137379] write(1, " 1 root root      315136 Oct  2 "..., 4096) = 4096
[pid 137379] write(1, "ewgrp\n-rwxr-xr-x 1 root root    "..., 4096) = 4096
[pid 137379] write(1, "-rwxr-xr-x 1 root root        43"..., 4096) = 4096
[pid 137379] write(1, "eb 27  2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096
[pid 137379] write(1, "672 Jun 18  2023 systemd-ask-pas"..., 4096) = 4096
[pid 137379] write(1, "zselect\n-rwxr-xr-x 1 root root  "..., 4096) = 4096
[pid 137379] write(1, "r-xr-x 1 root root        2081 A"..., 4096) = 4096
[pid 137379] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533
[pid 137379] +++ exited with 0 +++
[pid 137380] write(1, "\n\ndrwx------  2 root root 4096 J"..., 4096) = 4096
[pid 137380] write(1, "xrwx 1 root root          12 Jul"..., 4096) = 4096
[pid 137380] write(1, "Feb  2  2021 ping6 -> ping\nlrwxr"..., 4096) = 4096
[pid 137380] write(1, "t       10803 Sep 24  2021 pod2t"..., 4096) = 4096
[pid 137380] write(1, "22 colcrt\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137380] write(1, "  2022 zmore\n-rwxr-xr-x 1 root r"..., 4096) = 4096
[pid 137380] write(1, "6 Jun 19 08:47 TS\n-rwxr-xr-x 1 r"..., 4096) = 4096
[pid 137380] write(1, "oot       35048 Jan 20  2022 tas"..., 4096) = 4096
[pid 137380] write(1, "     4392 Sep 24  2021 ptargrep\n"..., 4096) = 4096
[pid 137380] write(1, "-xr-x 1 root root       56192 Se"..., 4096) = 4096
[pid 137380] write(1, "    7285 Nov 24  2016 boot-eos\n-"..., 4096) = 4096
[pid 137380] write(1, "ostname-config.sh\n-rwxr-xr-x 2 r"..., 533) = 533
[pid 137381] write(1, "\n", 1)          = 1
[pid 137380] +++ exited with 0 +++
[pid 137381] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137379, si_uid=1000, si_status=0, si_utime=1, si_stime=5} ---
+++ exited with 0 +++

I created 30 dummy archive files in a dev environment and then ran the modified script to verify that the OLDEST_ARCHIVE_FILE is the archive file that was created first:
The setup:

admin@sonic:~/testpipe/var/log$ pwd
/home/admin/testpipe/var/log

admin@sonic:~/testpipe/var/log$ ls -alrth
total 8.0K
drwxr-xr-x 3 admin admin 4.0K Jul 25 17:12 ..
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.1.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.2.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.3.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.5.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.4.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.6.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.7.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.9.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.8.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.10.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.11.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.12.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.13.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.14.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.15.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.16.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.17.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.18.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.19.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.20.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.21.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.22.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.23.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.24.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.25.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.26.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.27.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.28.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.29.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.30.gz
drwxr-xr-x 2 admin admin 4.0K Jul 25 17:17 .

The script:

admin@sonic:~/testpipe$ cat sort_test.sh
#!/usr/bin/bash

SORTED_ARCHIVE_FILES=$(find ./var/log -type f -printf '%T+ %p\n' | grep -E '.+\.[0-9]+(\.gz)?$' | sort )
OLDEST_ARCHIVE_FILE=$(echo "${SORTED_ARCHIVE_FILES}" | awk 'NR == 1 { print $2; }')
echo $OLDEST_ARCHIVE_FILE



The output:

admin@sonic:~/testpipe$ ./sort_test.sh
./var/log/syslog.1.gz
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Aug 2, 2024
…issues (sonic-net#19699)

Why I did it
Currently the logrotate script is susceptible to SIGPIPE issue because of a race condition between the sort and head commands that leads to the following error:

Jun 22 03:20:00.331117 sonic-switch INFO logrotate[1494655]: sort: write failed: 'standard output': Broken pipe
Jun 22 03:20:00.331207 sonic-switch INFO logrotate[1494655]: sort: write error
This is because if the sort command has not yet finished writing to stdout and head has finished executing and closes the pipe. Subsequent writes by the previously issued sort command would fail because of the closed pipe.

Work item tracking
Microsoft ADO (number only): 28511764
How I did it
I refactored the code in a way that removes the call to head after sort.

How to verify it
I created a bash script that induces a write failed: Broken pipe error on account of a sort | head combination and verified the SIGPIPE error code. I then removed the head command and replaced it with a call to awk and re-ran the script. This time there was no SIGPIPE.
The script:

#!/usr/bin/bash

var0=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | head -n 20 > /dev/null')

echo ""
echo "Removed head and modified awk:"
echo ""

var1=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | awk "NR == 1 {print $2}"')

The output

execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7ffc34b52f98 /* 17 vars */) = 0
strace: Process 137372 attached
strace: Process 137373 attached
strace: Process 137374 attached
[pid 137373] execve("/usr/bin/sort", ["sort"], 0x55be58b00350 /* 17 vars */ <unfinished ...>
[pid 137372] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55be58b00350 /* 17 vars */ <unfinished ...>
[pid 137373] <... execve resumed>)      = 0
[pid 137372] <... execve resumed>)      = 0
[pid 137374] execve("/usr/bin/head", ["head", "-n", "20"], 0x55be58b00350 /* 17 vars */) = 0
[pid 137372] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096
[pid 137372] write(1, "     30960 Jan 20  2022 colrm\n-r"..., 4096) = 4096
[pid 137372] write(1, "-xr-x 1 root root        1866 De"..., 4096) = 4096
[pid 137372] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137372] write(1, " 1 root root      315136 Oct  2 "..., 4096) = 4096
[pid 137372] write(1, "ewgrp\n-rwxr-xr-x 1 root root    "..., 4096) = 4096
[pid 137372] write(1, "-rwxr-xr-x 1 root root        43"..., 4096) = 4096
[pid 137372] write(1, "eb 27  2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096
[pid 137372] write(1, "672 Jun 18  2023 systemd-ask-pas"..., 4096) = 4096
[pid 137372] write(1, "zselect\n-rwxr-xr-x 1 root root  "..., 4096) = 4096
[pid 137372] write(1, "r-xr-x 1 root root        2081 A"..., 4096) = 4096
[pid 137372] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533
[pid 137372] +++ exited with 0 +++
[pid 137373] write(1, "\n\ndrwx------  2 root root 4096 J"..., 4096) = 4096
[pid 137373] write(1, "xrwx 1 root root          12 Jul"..., 4096) = 4096
[pid 137374] write(1, "\n\ndrwx------  2 root root 4096 J"..., 942 <unfinished ...>
[pid 137373] write(1, "Feb  2  2021 ping6 -> ping\nlrwxr"..., 4096 <unfinished ...>
[pid 137374] <... write resumed>)       = 942
[pid 137373] <... write resumed>)       = -1 EPIPE (Broken pipe)
[pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=137373, si_uid=1000} ---
[pid 137374] +++ exited with 0 +++
[pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_TKILL, si_pid=137373, si_uid=1000} ---
[pid 137373] +++ killed by SIGPIPE +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137372, si_uid=1000, si_status=0, si_utime=0, si_stime=4} ---
+++ exited with 0 +++

Removed head and modified awk:

execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7fff94a24d18 /* 17 vars */) = 0
strace: Process 137379 attached
strace: Process 137380 attached
strace: Process 137381 attached
[pid 137379] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55d437568350 /* 17 vars */ <unfinished ...>
[pid 137380] execve("/usr/bin/sort", ["sort"], 0x55d437568350 /* 17 vars */ <unfinished ...>
[pid 137379] <... execve resumed>)      = 0
[pid 137380] <... execve resumed>)      = 0
[pid 137381] execve("/usr/bin/awk", ["awk", "NR == 1 {print }"], 0x55d437568350 /* 17 vars */) = 0
[pid 137379] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096
[pid 137379] write(1, "     30960 Jan 20  2022 colrm\n-r"..., 4096) = 4096
[pid 137379] write(1, "-xr-x 1 root root        1866 De"..., 4096) = 4096
[pid 137379] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137379] write(1, " 1 root root      315136 Oct  2 "..., 4096) = 4096
[pid 137379] write(1, "ewgrp\n-rwxr-xr-x 1 root root    "..., 4096) = 4096
[pid 137379] write(1, "-rwxr-xr-x 1 root root        43"..., 4096) = 4096
[pid 137379] write(1, "eb 27  2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096
[pid 137379] write(1, "672 Jun 18  2023 systemd-ask-pas"..., 4096) = 4096
[pid 137379] write(1, "zselect\n-rwxr-xr-x 1 root root  "..., 4096) = 4096
[pid 137379] write(1, "r-xr-x 1 root root        2081 A"..., 4096) = 4096
[pid 137379] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533
[pid 137379] +++ exited with 0 +++
[pid 137380] write(1, "\n\ndrwx------  2 root root 4096 J"..., 4096) = 4096
[pid 137380] write(1, "xrwx 1 root root          12 Jul"..., 4096) = 4096
[pid 137380] write(1, "Feb  2  2021 ping6 -> ping\nlrwxr"..., 4096) = 4096
[pid 137380] write(1, "t       10803 Sep 24  2021 pod2t"..., 4096) = 4096
[pid 137380] write(1, "22 colcrt\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137380] write(1, "  2022 zmore\n-rwxr-xr-x 1 root r"..., 4096) = 4096
[pid 137380] write(1, "6 Jun 19 08:47 TS\n-rwxr-xr-x 1 r"..., 4096) = 4096
[pid 137380] write(1, "oot       35048 Jan 20  2022 tas"..., 4096) = 4096
[pid 137380] write(1, "     4392 Sep 24  2021 ptargrep\n"..., 4096) = 4096
[pid 137380] write(1, "-xr-x 1 root root       56192 Se"..., 4096) = 4096
[pid 137380] write(1, "    7285 Nov 24  2016 boot-eos\n-"..., 4096) = 4096
[pid 137380] write(1, "ostname-config.sh\n-rwxr-xr-x 2 r"..., 533) = 533
[pid 137381] write(1, "\n", 1)          = 1
[pid 137380] +++ exited with 0 +++
[pid 137381] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137379, si_uid=1000, si_status=0, si_utime=1, si_stime=5} ---
+++ exited with 0 +++

I created 30 dummy archive files in a dev environment and then ran the modified script to verify that the OLDEST_ARCHIVE_FILE is the archive file that was created first:
The setup:

admin@sonic:~/testpipe/var/log$ pwd
/home/admin/testpipe/var/log

admin@sonic:~/testpipe/var/log$ ls -alrth
total 8.0K
drwxr-xr-x 3 admin admin 4.0K Jul 25 17:12 ..
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.1.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.2.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.3.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.5.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.4.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.6.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.7.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.9.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.8.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.10.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.11.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.12.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.13.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.14.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.15.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.16.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.17.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.18.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.19.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.20.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.21.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.22.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.23.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.24.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.25.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.26.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.27.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.28.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.29.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.30.gz
drwxr-xr-x 2 admin admin 4.0K Jul 25 17:17 .

The script:

admin@sonic:~/testpipe$ cat sort_test.sh
#!/usr/bin/bash

SORTED_ARCHIVE_FILES=$(find ./var/log -type f -printf '%T+ %p\n' | grep -E '.+\.[0-9]+(\.gz)?$' | sort )
OLDEST_ARCHIVE_FILE=$(echo "${SORTED_ARCHIVE_FILES}" | awk 'NR == 1 { print $2; }')
echo $OLDEST_ARCHIVE_FILE



The output:

admin@sonic:~/testpipe$ ./sort_test.sh
./var/log/syslog.1.gz
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202311: #19783

@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #19784

mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Aug 2, 2024
…issues (sonic-net#19699)

Why I did it
Currently the logrotate script is susceptible to SIGPIPE issue because of a race condition between the sort and head commands that leads to the following error:

Jun 22 03:20:00.331117 sonic-switch INFO logrotate[1494655]: sort: write failed: 'standard output': Broken pipe
Jun 22 03:20:00.331207 sonic-switch INFO logrotate[1494655]: sort: write error
This is because if the sort command has not yet finished writing to stdout and head has finished executing and closes the pipe. Subsequent writes by the previously issued sort command would fail because of the closed pipe.

Work item tracking
Microsoft ADO (number only): 28511764
How I did it
I refactored the code in a way that removes the call to head after sort.

How to verify it
I created a bash script that induces a write failed: Broken pipe error on account of a sort | head combination and verified the SIGPIPE error code. I then removed the head command and replaced it with a call to awk and re-ran the script. This time there was no SIGPIPE.
The script:

#!/usr/bin/bash

var0=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | head -n 20 > /dev/null')

echo ""
echo "Removed head and modified awk:"
echo ""

var1=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | awk "NR == 1 {print $2}"')

The output

execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7ffc34b52f98 /* 17 vars */) = 0
strace: Process 137372 attached
strace: Process 137373 attached
strace: Process 137374 attached
[pid 137373] execve("/usr/bin/sort", ["sort"], 0x55be58b00350 /* 17 vars */ <unfinished ...>
[pid 137372] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55be58b00350 /* 17 vars */ <unfinished ...>
[pid 137373] <... execve resumed>)      = 0
[pid 137372] <... execve resumed>)      = 0
[pid 137374] execve("/usr/bin/head", ["head", "-n", "20"], 0x55be58b00350 /* 17 vars */) = 0
[pid 137372] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096
[pid 137372] write(1, "     30960 Jan 20  2022 colrm\n-r"..., 4096) = 4096
[pid 137372] write(1, "-xr-x 1 root root        1866 De"..., 4096) = 4096
[pid 137372] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137372] write(1, " 1 root root      315136 Oct  2 "..., 4096) = 4096
[pid 137372] write(1, "ewgrp\n-rwxr-xr-x 1 root root    "..., 4096) = 4096
[pid 137372] write(1, "-rwxr-xr-x 1 root root        43"..., 4096) = 4096
[pid 137372] write(1, "eb 27  2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096
[pid 137372] write(1, "672 Jun 18  2023 systemd-ask-pas"..., 4096) = 4096
[pid 137372] write(1, "zselect\n-rwxr-xr-x 1 root root  "..., 4096) = 4096
[pid 137372] write(1, "r-xr-x 1 root root        2081 A"..., 4096) = 4096
[pid 137372] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533
[pid 137372] +++ exited with 0 +++
[pid 137373] write(1, "\n\ndrwx------  2 root root 4096 J"..., 4096) = 4096
[pid 137373] write(1, "xrwx 1 root root          12 Jul"..., 4096) = 4096
[pid 137374] write(1, "\n\ndrwx------  2 root root 4096 J"..., 942 <unfinished ...>
[pid 137373] write(1, "Feb  2  2021 ping6 -> ping\nlrwxr"..., 4096 <unfinished ...>
[pid 137374] <... write resumed>)       = 942
[pid 137373] <... write resumed>)       = -1 EPIPE (Broken pipe)
[pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=137373, si_uid=1000} ---
[pid 137374] +++ exited with 0 +++
[pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_TKILL, si_pid=137373, si_uid=1000} ---
[pid 137373] +++ killed by SIGPIPE +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137372, si_uid=1000, si_status=0, si_utime=0, si_stime=4} ---
+++ exited with 0 +++

Removed head and modified awk:

execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7fff94a24d18 /* 17 vars */) = 0
strace: Process 137379 attached
strace: Process 137380 attached
strace: Process 137381 attached
[pid 137379] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55d437568350 /* 17 vars */ <unfinished ...>
[pid 137380] execve("/usr/bin/sort", ["sort"], 0x55d437568350 /* 17 vars */ <unfinished ...>
[pid 137379] <... execve resumed>)      = 0
[pid 137380] <... execve resumed>)      = 0
[pid 137381] execve("/usr/bin/awk", ["awk", "NR == 1 {print }"], 0x55d437568350 /* 17 vars */) = 0
[pid 137379] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096
[pid 137379] write(1, "     30960 Jan 20  2022 colrm\n-r"..., 4096) = 4096
[pid 137379] write(1, "-xr-x 1 root root        1866 De"..., 4096) = 4096
[pid 137379] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137379] write(1, " 1 root root      315136 Oct  2 "..., 4096) = 4096
[pid 137379] write(1, "ewgrp\n-rwxr-xr-x 1 root root    "..., 4096) = 4096
[pid 137379] write(1, "-rwxr-xr-x 1 root root        43"..., 4096) = 4096
[pid 137379] write(1, "eb 27  2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096
[pid 137379] write(1, "672 Jun 18  2023 systemd-ask-pas"..., 4096) = 4096
[pid 137379] write(1, "zselect\n-rwxr-xr-x 1 root root  "..., 4096) = 4096
[pid 137379] write(1, "r-xr-x 1 root root        2081 A"..., 4096) = 4096
[pid 137379] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533
[pid 137379] +++ exited with 0 +++
[pid 137380] write(1, "\n\ndrwx------  2 root root 4096 J"..., 4096) = 4096
[pid 137380] write(1, "xrwx 1 root root          12 Jul"..., 4096) = 4096
[pid 137380] write(1, "Feb  2  2021 ping6 -> ping\nlrwxr"..., 4096) = 4096
[pid 137380] write(1, "t       10803 Sep 24  2021 pod2t"..., 4096) = 4096
[pid 137380] write(1, "22 colcrt\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137380] write(1, "  2022 zmore\n-rwxr-xr-x 1 root r"..., 4096) = 4096
[pid 137380] write(1, "6 Jun 19 08:47 TS\n-rwxr-xr-x 1 r"..., 4096) = 4096
[pid 137380] write(1, "oot       35048 Jan 20  2022 tas"..., 4096) = 4096
[pid 137380] write(1, "     4392 Sep 24  2021 ptargrep\n"..., 4096) = 4096
[pid 137380] write(1, "-xr-x 1 root root       56192 Se"..., 4096) = 4096
[pid 137380] write(1, "    7285 Nov 24  2016 boot-eos\n-"..., 4096) = 4096
[pid 137380] write(1, "ostname-config.sh\n-rwxr-xr-x 2 r"..., 533) = 533
[pid 137381] write(1, "\n", 1)          = 1
[pid 137380] +++ exited with 0 +++
[pid 137381] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137379, si_uid=1000, si_status=0, si_utime=1, si_stime=5} ---
+++ exited with 0 +++

I created 30 dummy archive files in a dev environment and then ran the modified script to verify that the OLDEST_ARCHIVE_FILE is the archive file that was created first:
The setup:

admin@sonic:~/testpipe/var/log$ pwd
/home/admin/testpipe/var/log

admin@sonic:~/testpipe/var/log$ ls -alrth
total 8.0K
drwxr-xr-x 3 admin admin 4.0K Jul 25 17:12 ..
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.1.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.2.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.3.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.5.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.4.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.6.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.7.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.9.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.8.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.10.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.11.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.12.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.13.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.14.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.15.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.16.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.17.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.18.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.19.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.20.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.21.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.22.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.23.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.24.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.25.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.26.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.27.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.28.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.29.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.30.gz
drwxr-xr-x 2 admin admin 4.0K Jul 25 17:17 .

The script:

admin@sonic:~/testpipe$ cat sort_test.sh
#!/usr/bin/bash

SORTED_ARCHIVE_FILES=$(find ./var/log -type f -printf '%T+ %p\n' | grep -E '.+\.[0-9]+(\.gz)?$' | sort )
OLDEST_ARCHIVE_FILE=$(echo "${SORTED_ARCHIVE_FILES}" | awk 'NR == 1 { print $2; }')
echo $OLDEST_ARCHIVE_FILE



The output:

admin@sonic:~/testpipe$ ./sort_test.sh
./var/log/syslog.1.gz
mssonicbld pushed a commit that referenced this pull request Aug 3, 2024
…issues (#19699)

Why I did it
Currently the logrotate script is susceptible to SIGPIPE issue because of a race condition between the sort and head commands that leads to the following error:

Jun 22 03:20:00.331117 sonic-switch INFO logrotate[1494655]: sort: write failed: 'standard output': Broken pipe
Jun 22 03:20:00.331207 sonic-switch INFO logrotate[1494655]: sort: write error
This is because if the sort command has not yet finished writing to stdout and head has finished executing and closes the pipe. Subsequent writes by the previously issued sort command would fail because of the closed pipe.

Work item tracking
Microsoft ADO (number only): 28511764
How I did it
I refactored the code in a way that removes the call to head after sort.

How to verify it
I created a bash script that induces a write failed: Broken pipe error on account of a sort | head combination and verified the SIGPIPE error code. I then removed the head command and replaced it with a call to awk and re-ran the script. This time there was no SIGPIPE.
The script:

#!/usr/bin/bash

var0=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | head -n 20 > /dev/null')

echo ""
echo "Removed head and modified awk:"
echo ""

var1=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | awk "NR == 1 {print $2}"')

The output

execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7ffc34b52f98 /* 17 vars */) = 0
strace: Process 137372 attached
strace: Process 137373 attached
strace: Process 137374 attached
[pid 137373] execve("/usr/bin/sort", ["sort"], 0x55be58b00350 /* 17 vars */ <unfinished ...>
[pid 137372] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55be58b00350 /* 17 vars */ <unfinished ...>
[pid 137373] <... execve resumed>)      = 0
[pid 137372] <... execve resumed>)      = 0
[pid 137374] execve("/usr/bin/head", ["head", "-n", "20"], 0x55be58b00350 /* 17 vars */) = 0
[pid 137372] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096
[pid 137372] write(1, "     30960 Jan 20  2022 colrm\n-r"..., 4096) = 4096
[pid 137372] write(1, "-xr-x 1 root root        1866 De"..., 4096) = 4096
[pid 137372] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137372] write(1, " 1 root root      315136 Oct  2 "..., 4096) = 4096
[pid 137372] write(1, "ewgrp\n-rwxr-xr-x 1 root root    "..., 4096) = 4096
[pid 137372] write(1, "-rwxr-xr-x 1 root root        43"..., 4096) = 4096
[pid 137372] write(1, "eb 27  2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096
[pid 137372] write(1, "672 Jun 18  2023 systemd-ask-pas"..., 4096) = 4096
[pid 137372] write(1, "zselect\n-rwxr-xr-x 1 root root  "..., 4096) = 4096
[pid 137372] write(1, "r-xr-x 1 root root        2081 A"..., 4096) = 4096
[pid 137372] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533
[pid 137372] +++ exited with 0 +++
[pid 137373] write(1, "\n\ndrwx------  2 root root 4096 J"..., 4096) = 4096
[pid 137373] write(1, "xrwx 1 root root          12 Jul"..., 4096) = 4096
[pid 137374] write(1, "\n\ndrwx------  2 root root 4096 J"..., 942 <unfinished ...>
[pid 137373] write(1, "Feb  2  2021 ping6 -> ping\nlrwxr"..., 4096 <unfinished ...>
[pid 137374] <... write resumed>)       = 942
[pid 137373] <... write resumed>)       = -1 EPIPE (Broken pipe)
[pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=137373, si_uid=1000} ---
[pid 137374] +++ exited with 0 +++
[pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_TKILL, si_pid=137373, si_uid=1000} ---
[pid 137373] +++ killed by SIGPIPE +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137372, si_uid=1000, si_status=0, si_utime=0, si_stime=4} ---
+++ exited with 0 +++

Removed head and modified awk:

execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7fff94a24d18 /* 17 vars */) = 0
strace: Process 137379 attached
strace: Process 137380 attached
strace: Process 137381 attached
[pid 137379] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55d437568350 /* 17 vars */ <unfinished ...>
[pid 137380] execve("/usr/bin/sort", ["sort"], 0x55d437568350 /* 17 vars */ <unfinished ...>
[pid 137379] <... execve resumed>)      = 0
[pid 137380] <... execve resumed>)      = 0
[pid 137381] execve("/usr/bin/awk", ["awk", "NR == 1 {print }"], 0x55d437568350 /* 17 vars */) = 0
[pid 137379] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096
[pid 137379] write(1, "     30960 Jan 20  2022 colrm\n-r"..., 4096) = 4096
[pid 137379] write(1, "-xr-x 1 root root        1866 De"..., 4096) = 4096
[pid 137379] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137379] write(1, " 1 root root      315136 Oct  2 "..., 4096) = 4096
[pid 137379] write(1, "ewgrp\n-rwxr-xr-x 1 root root    "..., 4096) = 4096
[pid 137379] write(1, "-rwxr-xr-x 1 root root        43"..., 4096) = 4096
[pid 137379] write(1, "eb 27  2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096
[pid 137379] write(1, "672 Jun 18  2023 systemd-ask-pas"..., 4096) = 4096
[pid 137379] write(1, "zselect\n-rwxr-xr-x 1 root root  "..., 4096) = 4096
[pid 137379] write(1, "r-xr-x 1 root root        2081 A"..., 4096) = 4096
[pid 137379] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533
[pid 137379] +++ exited with 0 +++
[pid 137380] write(1, "\n\ndrwx------  2 root root 4096 J"..., 4096) = 4096
[pid 137380] write(1, "xrwx 1 root root          12 Jul"..., 4096) = 4096
[pid 137380] write(1, "Feb  2  2021 ping6 -> ping\nlrwxr"..., 4096) = 4096
[pid 137380] write(1, "t       10803 Sep 24  2021 pod2t"..., 4096) = 4096
[pid 137380] write(1, "22 colcrt\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137380] write(1, "  2022 zmore\n-rwxr-xr-x 1 root r"..., 4096) = 4096
[pid 137380] write(1, "6 Jun 19 08:47 TS\n-rwxr-xr-x 1 r"..., 4096) = 4096
[pid 137380] write(1, "oot       35048 Jan 20  2022 tas"..., 4096) = 4096
[pid 137380] write(1, "     4392 Sep 24  2021 ptargrep\n"..., 4096) = 4096
[pid 137380] write(1, "-xr-x 1 root root       56192 Se"..., 4096) = 4096
[pid 137380] write(1, "    7285 Nov 24  2016 boot-eos\n-"..., 4096) = 4096
[pid 137380] write(1, "ostname-config.sh\n-rwxr-xr-x 2 r"..., 533) = 533
[pid 137381] write(1, "\n", 1)          = 1
[pid 137380] +++ exited with 0 +++
[pid 137381] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137379, si_uid=1000, si_status=0, si_utime=1, si_stime=5} ---
+++ exited with 0 +++

I created 30 dummy archive files in a dev environment and then ran the modified script to verify that the OLDEST_ARCHIVE_FILE is the archive file that was created first:
The setup:

admin@sonic:~/testpipe/var/log$ pwd
/home/admin/testpipe/var/log

admin@sonic:~/testpipe/var/log$ ls -alrth
total 8.0K
drwxr-xr-x 3 admin admin 4.0K Jul 25 17:12 ..
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.1.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.2.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.3.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.5.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.4.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.6.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.7.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.9.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.8.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.10.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.11.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.12.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.13.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.14.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.15.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.16.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.17.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.18.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.19.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.20.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.21.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.22.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.23.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.24.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.25.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.26.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.27.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.28.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.29.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.30.gz
drwxr-xr-x 2 admin admin 4.0K Jul 25 17:17 .

The script:

admin@sonic:~/testpipe$ cat sort_test.sh
#!/usr/bin/bash

SORTED_ARCHIVE_FILES=$(find ./var/log -type f -printf '%T+ %p\n' | grep -E '.+\.[0-9]+(\.gz)?$' | sort )
OLDEST_ARCHIVE_FILE=$(echo "${SORTED_ARCHIVE_FILES}" | awk 'NR == 1 { print $2; }')
echo $OLDEST_ARCHIVE_FILE



The output:

admin@sonic:~/testpipe$ ./sort_test.sh
./var/log/syslog.1.gz
mssonicbld pushed a commit that referenced this pull request Aug 24, 2024
…issues (#19699)

Why I did it
Currently the logrotate script is susceptible to SIGPIPE issue because of a race condition between the sort and head commands that leads to the following error:

Jun 22 03:20:00.331117 sonic-switch INFO logrotate[1494655]: sort: write failed: 'standard output': Broken pipe
Jun 22 03:20:00.331207 sonic-switch INFO logrotate[1494655]: sort: write error
This is because if the sort command has not yet finished writing to stdout and head has finished executing and closes the pipe. Subsequent writes by the previously issued sort command would fail because of the closed pipe.

Work item tracking
Microsoft ADO (number only): 28511764
How I did it
I refactored the code in a way that removes the call to head after sort.

How to verify it
I created a bash script that induces a write failed: Broken pipe error on account of a sort | head combination and verified the SIGPIPE error code. I then removed the head command and replaced it with a call to awk and re-ran the script. This time there was no SIGPIPE.
The script:

#!/usr/bin/bash

var0=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | head -n 20 > /dev/null')

echo ""
echo "Removed head and modified awk:"
echo ""

var1=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | awk "NR == 1 {print $2}"')

The output

execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7ffc34b52f98 /* 17 vars */) = 0
strace: Process 137372 attached
strace: Process 137373 attached
strace: Process 137374 attached
[pid 137373] execve("/usr/bin/sort", ["sort"], 0x55be58b00350 /* 17 vars */ <unfinished ...>
[pid 137372] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55be58b00350 /* 17 vars */ <unfinished ...>
[pid 137373] <... execve resumed>)      = 0
[pid 137372] <... execve resumed>)      = 0
[pid 137374] execve("/usr/bin/head", ["head", "-n", "20"], 0x55be58b00350 /* 17 vars */) = 0
[pid 137372] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096
[pid 137372] write(1, "     30960 Jan 20  2022 colrm\n-r"..., 4096) = 4096
[pid 137372] write(1, "-xr-x 1 root root        1866 De"..., 4096) = 4096
[pid 137372] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137372] write(1, " 1 root root      315136 Oct  2 "..., 4096) = 4096
[pid 137372] write(1, "ewgrp\n-rwxr-xr-x 1 root root    "..., 4096) = 4096
[pid 137372] write(1, "-rwxr-xr-x 1 root root        43"..., 4096) = 4096
[pid 137372] write(1, "eb 27  2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096
[pid 137372] write(1, "672 Jun 18  2023 systemd-ask-pas"..., 4096) = 4096
[pid 137372] write(1, "zselect\n-rwxr-xr-x 1 root root  "..., 4096) = 4096
[pid 137372] write(1, "r-xr-x 1 root root        2081 A"..., 4096) = 4096
[pid 137372] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533
[pid 137372] +++ exited with 0 +++
[pid 137373] write(1, "\n\ndrwx------  2 root root 4096 J"..., 4096) = 4096
[pid 137373] write(1, "xrwx 1 root root          12 Jul"..., 4096) = 4096
[pid 137374] write(1, "\n\ndrwx------  2 root root 4096 J"..., 942 <unfinished ...>
[pid 137373] write(1, "Feb  2  2021 ping6 -> ping\nlrwxr"..., 4096 <unfinished ...>
[pid 137374] <... write resumed>)       = 942
[pid 137373] <... write resumed>)       = -1 EPIPE (Broken pipe)
[pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=137373, si_uid=1000} ---
[pid 137374] +++ exited with 0 +++
[pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_TKILL, si_pid=137373, si_uid=1000} ---
[pid 137373] +++ killed by SIGPIPE +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137372, si_uid=1000, si_status=0, si_utime=0, si_stime=4} ---
+++ exited with 0 +++

Removed head and modified awk:

execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7fff94a24d18 /* 17 vars */) = 0
strace: Process 137379 attached
strace: Process 137380 attached
strace: Process 137381 attached
[pid 137379] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55d437568350 /* 17 vars */ <unfinished ...>
[pid 137380] execve("/usr/bin/sort", ["sort"], 0x55d437568350 /* 17 vars */ <unfinished ...>
[pid 137379] <... execve resumed>)      = 0
[pid 137380] <... execve resumed>)      = 0
[pid 137381] execve("/usr/bin/awk", ["awk", "NR == 1 {print }"], 0x55d437568350 /* 17 vars */) = 0
[pid 137379] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096
[pid 137379] write(1, "     30960 Jan 20  2022 colrm\n-r"..., 4096) = 4096
[pid 137379] write(1, "-xr-x 1 root root        1866 De"..., 4096) = 4096
[pid 137379] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137379] write(1, " 1 root root      315136 Oct  2 "..., 4096) = 4096
[pid 137379] write(1, "ewgrp\n-rwxr-xr-x 1 root root    "..., 4096) = 4096
[pid 137379] write(1, "-rwxr-xr-x 1 root root        43"..., 4096) = 4096
[pid 137379] write(1, "eb 27  2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096
[pid 137379] write(1, "672 Jun 18  2023 systemd-ask-pas"..., 4096) = 4096
[pid 137379] write(1, "zselect\n-rwxr-xr-x 1 root root  "..., 4096) = 4096
[pid 137379] write(1, "r-xr-x 1 root root        2081 A"..., 4096) = 4096
[pid 137379] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533
[pid 137379] +++ exited with 0 +++
[pid 137380] write(1, "\n\ndrwx------  2 root root 4096 J"..., 4096) = 4096
[pid 137380] write(1, "xrwx 1 root root          12 Jul"..., 4096) = 4096
[pid 137380] write(1, "Feb  2  2021 ping6 -> ping\nlrwxr"..., 4096) = 4096
[pid 137380] write(1, "t       10803 Sep 24  2021 pod2t"..., 4096) = 4096
[pid 137380] write(1, "22 colcrt\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137380] write(1, "  2022 zmore\n-rwxr-xr-x 1 root r"..., 4096) = 4096
[pid 137380] write(1, "6 Jun 19 08:47 TS\n-rwxr-xr-x 1 r"..., 4096) = 4096
[pid 137380] write(1, "oot       35048 Jan 20  2022 tas"..., 4096) = 4096
[pid 137380] write(1, "     4392 Sep 24  2021 ptargrep\n"..., 4096) = 4096
[pid 137380] write(1, "-xr-x 1 root root       56192 Se"..., 4096) = 4096
[pid 137380] write(1, "    7285 Nov 24  2016 boot-eos\n-"..., 4096) = 4096
[pid 137380] write(1, "ostname-config.sh\n-rwxr-xr-x 2 r"..., 533) = 533
[pid 137381] write(1, "\n", 1)          = 1
[pid 137380] +++ exited with 0 +++
[pid 137381] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137379, si_uid=1000, si_status=0, si_utime=1, si_stime=5} ---
+++ exited with 0 +++

I created 30 dummy archive files in a dev environment and then ran the modified script to verify that the OLDEST_ARCHIVE_FILE is the archive file that was created first:
The setup:

admin@sonic:~/testpipe/var/log$ pwd
/home/admin/testpipe/var/log

admin@sonic:~/testpipe/var/log$ ls -alrth
total 8.0K
drwxr-xr-x 3 admin admin 4.0K Jul 25 17:12 ..
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.1.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.2.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.3.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.5.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.4.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.6.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.7.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.9.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.8.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.10.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.11.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.12.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.13.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.14.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.15.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.16.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.17.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.18.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.19.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.20.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.21.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.22.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.23.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.24.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.25.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.26.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.27.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.28.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.29.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.30.gz
drwxr-xr-x 2 admin admin 4.0K Jul 25 17:17 .

The script:

admin@sonic:~/testpipe$ cat sort_test.sh
#!/usr/bin/bash

SORTED_ARCHIVE_FILES=$(find ./var/log -type f -printf '%T+ %p\n' | grep -E '.+\.[0-9]+(\.gz)?$' | sort )
OLDEST_ARCHIVE_FILE=$(echo "${SORTED_ARCHIVE_FILES}" | awk 'NR == 1 { print $2; }')
echo $OLDEST_ARCHIVE_FILE



The output:

admin@sonic:~/testpipe$ ./sort_test.sh
./var/log/syslog.1.gz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants