Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

services-alarm-server : increase STABILIZATION_SECS when exporting #2596 #2597

Merged

Conversation

lcaouen
Copy link
Contributor

@lcaouen lcaouen commented Mar 16, 2023

Added an argument export_wait to change the STABILIZATION_SECS default value of 4s.
My server is slow and 4s is too small to get a correct export.
Also added the possibility to set it in the settings.ini (org.phoebus.applications.alarm/export_wait)
Not sure the name I've chosen is the best, let you change it if necessary.

@kasemir
Copy link
Collaborator

kasemir commented Mar 16, 2023

That number "4" was somewhat arbitrary, so it's a good idea to make it configurable.

Still, I wonder if we should change the default, or if there's something else going on with your setup.

The fundamental issue is that the configuration is a stream. There's no way to know if we now have received the "complete" configuration, because it's never complete, an item can be added/removed/updated and then you get another configuration message.
So what we did is wait for a pause of 4 seconds. Mind you that's not simply waiting 4 seconds. We do wait for a period where there are no more changes for at least 4 seconds. So if your setup is overall slow and it takes 30 seconds to stream all the config changes, just a few changes every second, that should be fine because we then wait until there are no more changes for 4 seconds.
But do you really see several pauses in the config stream that last more than 4 seconds?
If you run monitor_topic.sh Accelerator (or whatever you config name is), does that also show pauses more than 4 seconds?
When I run monitor_topic.sh Accelerator, it dumps several thousand config messages within 1-2 seconds, and then there's a pause, so "4" has been a good number until now.

@lcaouen
Copy link
Contributor Author

lcaouen commented Mar 16, 2023

Thanks to your comment, I understand better the way it works
On my server, I can see a new message every 10s.
We just have 2 PVs used in our alarm service, that's probably why we have less updates.
Moreover, the PV values don't change so often.
image

@kasemir
Copy link
Collaborator

kasemir commented Mar 16, 2023

Those are the idle "state:".. updates which are indeed happening every 10 seconds, or of course whenever there's a state change. The AlarmConfigMonitor looks for "config:" messages, and then waits for a 4 (by default) second pause in them.

When you run monitor_topic.sh Accelerator, you should see

  1. Maybe a 1 second startup delay for the tool to load
  2. A very brief burst of all the config (and status) messages
  3. Then no more config messages, just a state update every 10 seconds

@lcaouen
Copy link
Contributor Author

lcaouen commented Mar 16, 2023

Here is what I get with monitor_topic.sh
image
It took around 10s to see the output after running monitor_topic.sh

@kasemir
Copy link
Collaborator

kasemir commented Mar 16, 2023

If you run monitor_topic .. and then stop it with Control-C after 4 seconds, does that not include all the "config:" messages? Do you actually need 10 seconds to see all the "config:" messages?
With STABILIZATION_SECS being configurable, what value would you use other than 4?

@lcaouen
Copy link
Contributor Author

lcaouen commented Mar 17, 2023

If I stop it after 4s I get nothing :
image

If you're ok with my modfications in the PR, it will solve my problem.
I'm already using my fork to get the xml.

@kasemir kasemir merged commit 5d57d5f into ControlSystemStudio:master Mar 17, 2023
@kasemir
Copy link
Collaborator

kasemir commented Mar 17, 2023

If I stop it after 4s I get nothing

Well, that's interesting. For me, it takes about 1 second to startup/connect/get the first value.
Then there can be thousands of rows in just 1-2 seconds, and then nothing unless there's an actual state change, or the 10 second idle status messages.

Is there something in your network setup or kafka configuration that would explain a longer initial delay?

So in your case it looks like it's mostly about that initial connection time.
As I said before, making the STABILIZATION_SECS configurable is fine, but once more, ,what value do you now use??

Also, the problem might be more generally handled by waiting for the first config message, and only then enabling the (by default) 4 second stabilization check, because I assume you also get all the config messages in rapid succession once the client is properly connected.

@kasemir
Copy link
Collaborator

kasemir commented Mar 17, 2023

I'll create an update that adds an initial wait for the first config message, which is basically a connection timeout, changes the parameter name from "-export_wait" because it's actually also used for "-import", and lists that parameter in the "-help" output.

@lcaouen
Copy link
Contributor Author

lcaouen commented Mar 17, 2023

Thanks for the merge.
To answer your question about my configuration, I'm using a VM (Hyper-V) on my computer (not a server), that's maybe why it's slower. I have to set a value of 10s for STABILIZATION_SECS to get an answer.
I'll do some other tests on another server (real one this time) next week. I will let you know if it's faster.

@kasemir
Copy link
Collaborator

kasemir commented Mar 17, 2023

OK, there's now a designated -connect_secs time that defaults to 10 seconds, and the -stable_secs that looks for a pause. The latter can likely be set back to 4 seconds (default), because otherwise the state is never stable with 10 second "idle" state updates.

This applies to import, export as well as alarm server startup

@kasemir
Copy link
Collaborator

kasemir commented Mar 17, 2023

Note the change in names. Instead of -export_wait, use -connect_secs to address the longer connection time.
Ideally, the default of 10 sec for that new option should suffice.

@lcaouen
Copy link
Contributor Author

lcaouen commented Mar 20, 2023

I did some tests on another server and I only needed 1s for connect_secs instead of 10 on my computer, so the problem was more my computer.
Thanks for the news changes , it will help us !

@kasemir
Copy link
Collaborator

kasemir commented Mar 20, 2023

Thanks for the update! It's certainly good that we now have a configurable delay for the initial connection, because no matter if it's 1 or 10 seconds, getting that initial message always seems to take longer. From then on, we can usually receive 1000 or more per second.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants