Calling clixon_backend -z does not (always) stop clixon_restconf #420

troglobit · 2023-02-14T14:24:21Z

While debugging issues with running the regression tests in PR #418, we noticed that test_restconf_internal.sh often failed, and always at the same step:

Test 41(41) [11. Get restconf status rpc]
Test 42(42) [send rpc status]
grep: warning: stray \ before -
grep: warning: stray \ before -
Test 43(43) [Kill backend]
Feb 10 12:27:07: Killing old daemon with pid: 60371
Test 44(44) [Restart backend -s none]
Test 45(45) [kill old backend]
Test 46(46) [start backend -s none -f /var/tmp/./test_restconf_internal.sh/conf.xml]
Test 47(47) [wait backend]
Test 48(48) [wait restconf]
speed 38400 baud; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>;
eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R;
werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
-brkint -imaxbel
Test 49(49) [12. Get restconf (running) after restart]
Test 50(50) [send rpc status]
grep: warning: stray \ before -
grep: warning: stray \ before -

Error in Test50 [send rpc status]:
Expected
^<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="42"><active xmlns="http://clicon.org/lib">true</active><description xmlns="http://clicon.org/lib">Clixon RESTCONF process</description><command xmlns="http://clicon.org/lib">/.*/clixon_restconf -f /var/tmp/./test_restconf_internal.sh/conf.xml -D [0-9] .*</command><status xmlns="http://clicon.org/lib">running</status><starttime xmlns="http://clicon.org/lib">20[0-9][0-9]\-[0-9][0-9]\-[0-9][0-9]T[0-9][0-9]:[0-9][0-9]:[0-9][0-9]\.[0-9]*Z</starttime><pid xmlns="http://clicon.org/lib">0</pid></rpc-reply>$

Received: 
#889
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="42"><active xmlns="http://clicon.org/lib">false</active><description xmlns="http://clicon.org/lib">Clixon RESTCONF process</description><command xmlns="http://clicon.org/lib">/usr/local/sbin/clixon_restconf -f /var/tmp/./test_restconf_internal.sh/conf.xml -D 0 -l s -R &lt;restconf xmlns="http://clicon.org/restconf"&gt;&lt;enable&gt;true&lt;/enable&gt;&lt;auth-type&gt;none&lt;/auth-type&gt;&lt;debug&gt;0&lt;/debug&gt;&lt;log-destination&gt;syslog&lt;/log-destination&gt;&lt;enable-core-dump&gt;false&lt;/enable-core-dump&gt;&lt;pretty&gt;false&lt;/pretty&gt;&lt;socket&gt;&lt;namespace&gt;default&lt;/namespace&gt;&lt;address&gt;0.0.0.0&lt;/address&gt;&lt;port&gt;80&lt;/port&gt;&lt;ssl&gt;false&lt;/ssl&gt;&lt;/socket&gt;&lt;/restconf&gt;</command><status xmlns="http://clicon.org/lib">stopped</status></rpc-reply>
##

We've reproduced this on three different machines, all running a Ubuntu 18.04 or 22.04 derivative, despite the tests running in an Alpine Linux docker container. (On an old Fedora 33 server, however, all tests pass ... unclear why, but worth mentioning for reference.)

Turns out that at the end of step "11. Get restconf status rpc", when stop_backend -f $cfg is called, the clixon_backend -z -f $cfg does not stop the clixon_restconf child. So in step "12. Get restconf (running) after restart" the rpcstatus call fails.

Adding a killall "fixes" the problem:

@@ -320,6 +320,7 @@ if [ $BE -ne 0 ]; then
     fi
     # kill backend
     stop_backend -f $cfg
+    killall clixon_restconf
 fi
 
 # Restconf is enabled and restconf was running but was killed by stop ^.

Obviously this is not the proper fix. Question is why clixon_backend doesn't kill the child it started, or why a new clixon_backend cannot reconnect with an already running clixon_restconf process? I'm not up to speed how that dynamic works, but maybe someone else in the wider Clixon community do.

There were a few other warnings as well, for instance the "stray \ before -" grep warning. This seems to just be a minor regexp mistake (- not needed outside of regexp ranges):

@@ -115,7 +115,7 @@ EOF
         err "No pid return value" "$retx"
     fi
     if $active; then
-        expect="^<rpc-reply $DEFAULTNS><active $LIBNS>$active</active><description $LIBNS>Clixon RESTCONF process</description><command $LIBNS>/.*/clixon_restconf -f $cfg -D [0-9] .*</command><status $LIBNS>$status</status><starttime $LIBNS>20[0-9][0-9]\-[0-9][0-9]\-[0-9][0-9]T[0-9][0-9]:[0-9][0-9]:[0-9][0-9]\.[0-9]*Z</starttime><pid $LIBNS>$pid</pid></rpc-reply>$"
+        expect="^<rpc-reply $DEFAULTNS><active $LIBNS>$active</active><description $LIBNS>Clixon RESTCONF process</description><command $LIBNS>/.*/clixon_restconf -f $cfg -D [0-9] .*</command><status $LIBNS>$status</status><starttime $LIBNS>20[0-9][0-9]-[0-9][0-9]-[0-9][0-9]T[0-9][0-9]:[0-9][0-9]:[0-9][0-9]\.[0-9]*Z</starttime><pid $LIBNS>$pid</pid></rpc-reply>$"
     else
         # inactive, no startime or pid
         expect="^<rpc-reply $DEFAULTNS><active $LIBNS>$active</active><description $LIBNS>Clixon RESTCONF process</description><command $LIBNS>/.*/clixon_restconf -f $cfg -D [0-9] .*</command><status $LIBNS>$status</status></rpc-reply>$"

Compared to the Fedora machine we also noticed that restoring the tty with stty also gives some noise. Here's a lib.sh patch:

@@ -638,7 +638,7 @@ function wait_restconf(){
         sleep 2 # some problems with valgrind
     fi
 
-  stty $STTYSETTINGS
+  stty $STTYSETTINGS >/dev/null

Finally, in lib.sh, there's an additional pkill that seems a bit malplaced:

@@ -554,7 +554,7 @@ function stop_backend(){
         sleep 1
         checkvalgrind
     fi
-    sudo pkill -f clixon_backend # extra ($BUSER?)
+#    sudo pkill -f clixon_backend # extra ($BUSER?)
 }
 
 # Wait for restconf to stop sending  502 Bad Gateway

The text was updated successfully, but these errors were encountered:

olofhagsand · 2023-02-14T19:46:29Z

I think that test_restconf_internal.sh suffers from some quality problems, ie the test itself, it was somewhat problematic writing it because of timing and multile processes.
Thank you for your effort.

troglobit · 2023-02-14T20:53:38Z

No worries. Do you want us to look into the shutting down of clixon_restconf when the backend receives SIGTERM?

olofhagsand · 2023-02-15T12:46:36Z

The "internal" restconf is one of three ways to start the restconf daemon, as described here:
https://clixon-docs.readthedocs.io/en/latest/restconf.html#datastore
ie by forking from the backend. it is probably not the most common scenario (such as starting the restconf daemon separately) but there are products using this method.
test_restconf_internal.sh tests this.
But as you point out there are some issues.
If you want you are welcome to debug this, at least the symtom: what more specifically is wrong, and how to repeat it deterministically.
There is some docs here on starting sub-processes.

troglobit · 2023-02-15T19:22:11Z

Alright, we'll look into it. I'll keep you posted, cheers!

troglobit mentioned this issue Feb 14, 2023

Misc. build fixes encountered when cross-compiling #418

Merged

olofhagsand added the bug label Feb 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calling clixon_backend -z does not (always) stop clixon_restconf #420

Calling clixon_backend -z does not (always) stop clixon_restconf #420

troglobit commented Feb 14, 2023 •

edited

Loading

olofhagsand commented Feb 14, 2023

troglobit commented Feb 14, 2023

olofhagsand commented Feb 15, 2023

troglobit commented Feb 15, 2023

Calling clixon_backend -z does not (always) stop clixon_restconf #420

Calling clixon_backend -z does not (always) stop clixon_restconf #420

Comments

troglobit commented Feb 14, 2023 • edited Loading

olofhagsand commented Feb 14, 2023

troglobit commented Feb 14, 2023

olofhagsand commented Feb 15, 2023

troglobit commented Feb 15, 2023

troglobit commented Feb 14, 2023 •

edited

Loading