-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
console_task should repeat command extraction if there are still commands in the buffer #243
Comments
I think, even if there are other possibilities, this simple code change is cheap and the fastest possible, so it seems to be the right thing to do... |
On Fri, Mar 16, 2018 at 01:57:20AM +0000, Harald wrote:
instead of extracting a single command:
[...]
it should immediately retry if there is still a command in the buffer:
I would not recommend that. It could lead to starvation of other
tasks. In particular, it could (under high load) starve the watchdog
task.
All of the Klipper tasks are careful to issue a sched_wake_tasks() if
they need to be run again. So, going back out and servicing any other
tasks takes no appreciable time before we're back in the command
dispatch code again.
[...]
Apart from that I have no clue, why the AVR doesn't seem to suffer from this, despite it has much less processing power.
From what I heard the SAM (Due) works, too. And it's code is very similar to the lpc code.
May be my use case is a bit more demanding...but with the lpc1768, it was totally impossible to print without this change.
This isn't the place to report issues on code not in the Klipper
repo.
If you can reproduce an issue in the main Klipper code, I'd be happy
to take a look at it. If it can't be reproduced on the main Klipper
code then it's not appropriate for here. (Feel free to ask questions
about how the main klipper code works - irc or the mailing list is
probably more appropriate than a github issue though.)
…-Kevin
|
I thought of starvation, too. So I also tried other possibilities, e.g.:
at the end of console_task() which should do what you describe. It seems to be a balancing act between reading the receive buffer before it fills up and the rate of periodic_event.
This code is taken from your original sources, e.g. from https://github.com/KevinOConnor/klipper/blob/master/src/sam3x8e/serial.c#L113 :
The same code is in AVR. Only rpos is read from 8bit instead of 32bit. I am not sure what happens exactly, may be this: I created this issue in your repo, because the code is the same. The difference is only in the timing. That means other implementations will probably get similar problems. Btw. I tested the LPC USB code used by cruwaller today and it behaves exactly like before (shutdown after a few moves). FYI: with the UART code using interrupts + while loop or polling + continuous scheduling the LPC1768 can reach 1000 mm/s (naturally only with steppers, no mechanical parts). And at this high speed the acceleration still sounds smooth, so I think the timing is still correct. I tested this with a real model and with 100mm square movements. |
I just tested to limit the loop to a few iterations:
A single retry (i< 2) seems to work, even at high speed. Apart from that, a loop probably doesn't prevent scheduling, because I assume the message receive time is less than the message processing time. |
On Fri, Mar 16, 2018 at 08:53:51AM -0700, Harald wrote:
> All of the Klipper tasks are careful to issue a sched_wake_tasks() if they need to be run again
This code is taken from your original sources, e.g. from https://github.com/KevinOConnor/klipper/blob/master/src/sam3x8e/serial.c#L113 :
```
// Process any incoming commands
void
console_task(void)
{
uint8_t pop_count;
uint32_t rpos = readl(&receive_pos);
int8_t ret = command_find_block(receive_buf, rpos, &pop_count);
if (ret > 0)
command_dispatch(receive_buf, pop_count);
if (ret)
console_pop_input(pop_count);
}
DECL_TASK(console_task);
```
The same code is in AVR. Only rpos is read from 8bit instead of 32bit.
This does not check if there is another message in the buffer, in which case it should probably call sched_wake_tasks().
It does check - the call to sched_wake_tasks() is in console_pop_input().
I created this issue in your repo, because the code is the same. The difference is only in the timing. That means other implementations will probably get similar problems.
I can see nothing wrong with the code, and you've repeatedly indicated
the main Klipper code does not show the problem. All indicators point
to an issue in external code. I'm really puzzled why you would report
this as a defect here.
…-Kevin
|
I agree, that I didn't experience this issue with RAMPS. You are also right, that console_pop_input will reschedule the task(s) when a command is removed from the buffer (which probably is the correct place).
but my observations were
I will recheck these observations, hopefully today or tomorrow... So what to do...
I understand that you tend to keep your issues clean. |
so...I re-visited all my tests and found what's going wrong. I basically had these three test situations and tried to draw conclusions from them, but the reasons were different than I thought:
Additionally I was misinterpreting a part of console_pop_input (thinking wrong way because of the variable name needcopy, you only see what you already think :-) ). I apologize for taking your time and will close this issue. |
I had a lot of problems with the lpc1768 code, because of shutdowns.
Every variant I tried didn't work in one or another way.
after many debugging sessions it melted down to a very simple change in console_task:
instead of extracting a single command:
it should immediately retry if there is still a command in the buffer:
This solved all my former problems, so all my code variants are working now (ahem...not really, usb is not tested yet).
Reasoning:
A polling time of 100ms (periodic_event, which seems to be the rate at which the console buffer is examined) is not enough when receiving messages at higher rates.
I think identify data and config data are sent in multiple chunks (without waiting?).
I am not sure why, but I think while printing there are situations when multiple messages add up in the receive buffer.
In both cases a single wait of 100ms can lead to a fatal overrun.
So, processing of bytes in the receive buffer should be as fast as possible.
I guess, some of the other issues with shutdowns and timing could also fade away when changing this code.
More background:
While testing, I also solved this either
Both lead to much faster processing of the receive buffer.
So, as an alternative there could be an extra queue for high priority tasks which could be executed at a higher rate. But I think this is not necessary right now and adds more complexity and probably unnecessary processing time.
I am adding this as issue, because it should probably be changed at all occurrences of console_task and my code base is very different.
Also, may be you have a better idea how to do this (as usual :-)).
Apart from that I have no clue, why the AVR doesn't seem to suffer from this, despite it has much less processing power.
From what I heard the SAM (Due) works, too. And it's code is very similar to the lpc code.
May be my use case is a bit more demanding...but with the lpc1768, it was totally impossible to print without this change.
The text was updated successfully, but these errors were encountered: