Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling clixon_backend -z does not (always) stop clixon_restconf #420

Open
troglobit opened this issue Feb 14, 2023 · 4 comments
Open

Calling clixon_backend -z does not (always) stop clixon_restconf #420

troglobit opened this issue Feb 14, 2023 · 4 comments
Labels

Comments

@troglobit
Copy link
Contributor

troglobit commented Feb 14, 2023

While debugging issues with running the regression tests in PR #418, we noticed that test_restconf_internal.sh often failed, and always at the same step:

Test 41(41) [11. Get restconf status rpc]
Test 42(42) [send rpc status]
grep: warning: stray \ before -
grep: warning: stray \ before -
Test 43(43) [Kill backend]
Feb 10 12:27:07: Killing old daemon with pid: 60371
Test 44(44) [Restart backend -s none]
Test 45(45) [kill old backend]
Test 46(46) [start backend -s none -f /var/tmp/./test_restconf_internal.sh/conf.xml]
Test 47(47) [wait backend]
Test 48(48) [wait restconf]
speed 38400 baud; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>;
eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R;
werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
-brkint -imaxbel
Test 49(49) [12. Get restconf (running) after restart]
Test 50(50) [send rpc status]
grep: warning: stray \ before -
grep: warning: stray \ before -

Error in Test50 [send rpc status]:
Expected
^<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="42"><active xmlns="http://clicon.org/lib">true</active><description xmlns="http://clicon.org/lib">Clixon RESTCONF process</description><command xmlns="http://clicon.org/lib">/.*/clixon_restconf -f /var/tmp/./test_restconf_internal.sh/conf.xml -D [0-9] .*</command><status xmlns="http://clicon.org/lib">running</status><starttime xmlns="http://clicon.org/lib">20[0-9][0-9]\-[0-9][0-9]\-[0-9][0-9]T[0-9][0-9]:[0-9][0-9]:[0-9][0-9]\.[0-9]*Z</starttime><pid xmlns="http://clicon.org/lib">0</pid></rpc-reply>$

Received: 
#889
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="42"><active xmlns="http://clicon.org/lib">false</active><description xmlns="http://clicon.org/lib">Clixon RESTCONF process</description><command xmlns="http://clicon.org/lib">/usr/local/sbin/clixon_restconf -f /var/tmp/./test_restconf_internal.sh/conf.xml -D 0 -l s -R &lt;restconf xmlns="http://clicon.org/restconf"&gt;&lt;enable&gt;true&lt;/enable&gt;&lt;auth-type&gt;none&lt;/auth-type&gt;&lt;debug&gt;0&lt;/debug&gt;&lt;log-destination&gt;syslog&lt;/log-destination&gt;&lt;enable-core-dump&gt;false&lt;/enable-core-dump&gt;&lt;pretty&gt;false&lt;/pretty&gt;&lt;socket&gt;&lt;namespace&gt;default&lt;/namespace&gt;&lt;address&gt;0.0.0.0&lt;/address&gt;&lt;port&gt;80&lt;/port&gt;&lt;ssl&gt;false&lt;/ssl&gt;&lt;/socket&gt;&lt;/restconf&gt;</command><status xmlns="http://clicon.org/lib">stopped</status></rpc-reply>
##

We've reproduced this on three different machines, all running a Ubuntu 18.04 or 22.04 derivative, despite the tests running in an Alpine Linux docker container. (On an old Fedora 33 server, however, all tests pass ... unclear why, but worth mentioning for reference.)

Turns out that at the end of step "11. Get restconf status rpc", when stop_backend -f $cfg is called, the clixon_backend -z -f $cfg does not stop the clixon_restconf child. So in step "12. Get restconf (running) after restart" the rpcstatus call fails.

Adding a killall "fixes" the problem:

@@ -320,6 +320,7 @@ if [ $BE -ne 0 ]; then
     fi
     # kill backend
     stop_backend -f $cfg
+    killall clixon_restconf
 fi
 
 # Restconf is enabled and restconf was running but was killed by stop ^.

Obviously this is not the proper fix. Question is why clixon_backend doesn't kill the child it started, or why a new clixon_backend cannot reconnect with an already running clixon_restconf process? I'm not up to speed how that dynamic works, but maybe someone else in the wider Clixon community do.

There were a few other warnings as well, for instance the "stray \ before -" grep warning. This seems to just be a minor regexp mistake (- not needed outside of regexp ranges):

@@ -115,7 +115,7 @@ EOF
         err "No pid return value" "$retx"
     fi
     if $active; then
-        expect="^<rpc-reply $DEFAULTNS><active $LIBNS>$active</active><description $LIBNS>Clixon RESTCONF process</description><command $LIBNS>/.*/clixon_restconf -f $cfg -D [0-9] .*</command><status $LIBNS>$status</status><starttime $LIBNS>20[0-9][0-9]\-[0-9][0-9]\-[0-9][0-9]T[0-9][0-9]:[0-9][0-9]:[0-9][0-9]\.[0-9]*Z</starttime><pid $LIBNS>$pid</pid></rpc-reply>$"
+        expect="^<rpc-reply $DEFAULTNS><active $LIBNS>$active</active><description $LIBNS>Clixon RESTCONF process</description><command $LIBNS>/.*/clixon_restconf -f $cfg -D [0-9] .*</command><status $LIBNS>$status</status><starttime $LIBNS>20[0-9][0-9]-[0-9][0-9]-[0-9][0-9]T[0-9][0-9]:[0-9][0-9]:[0-9][0-9]\.[0-9]*Z</starttime><pid $LIBNS>$pid</pid></rpc-reply>$"
     else
         # inactive, no startime or pid
         expect="^<rpc-reply $DEFAULTNS><active $LIBNS>$active</active><description $LIBNS>Clixon RESTCONF process</description><command $LIBNS>/.*/clixon_restconf -f $cfg -D [0-9] .*</command><status $LIBNS>$status</status></rpc-reply>$"

Compared to the Fedora machine we also noticed that restoring the tty with stty also gives some noise. Here's a lib.sh patch:

@@ -638,7 +638,7 @@ function wait_restconf(){
         sleep 2 # some problems with valgrind
     fi
 
-  stty $STTYSETTINGS
+  stty $STTYSETTINGS >/dev/null

Finally, in lib.sh, there's an additional pkill that seems a bit malplaced:

@@ -554,7 +554,7 @@ function stop_backend(){
         sleep 1
         checkvalgrind
     fi
-    sudo pkill -f clixon_backend # extra ($BUSER?)
+#    sudo pkill -f clixon_backend # extra ($BUSER?)
 }
 
 # Wait for restconf to stop sending  502 Bad Gateway
@olofhagsand
Copy link
Member

I think that test_restconf_internal.sh suffers from some quality problems, ie the test itself, it was somewhat problematic writing it because of timing and multile processes.
Thank you for your effort.

@troglobit
Copy link
Contributor Author

No worries. Do you want us to look into the shutting down of clixon_restconf when the backend receives SIGTERM?

@olofhagsand
Copy link
Member

The "internal" restconf is one of three ways to start the restconf daemon, as described here:
https://clixon-docs.readthedocs.io/en/latest/restconf.html#datastore
ie by forking from the backend. it is probably not the most common scenario (such as starting the restconf daemon separately) but there are products using this method.
test_restconf_internal.sh tests this.
But as you point out there are some issues.
If you want you are welcome to debug this, at least the symtom: what more specifically is wrong, and how to repeat it deterministically.
There is some docs here on starting sub-processes.

@troglobit
Copy link
Contributor Author

Alright, we'll look into it. I'll keep you posted, cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants