Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

samd: Don't rely on RTC interrupt #6202

Merged
merged 1 commit into from
Apr 5, 2022
Merged

Conversation

jepler
Copy link
Member

@jepler jepler commented Mar 25, 2022

I instrumented RTC_Handler and determined that on SAMD51 it was possible for the interrupt to be delivered well before the actual overflow of the RTC COUNT register (e.g., a value as small as 0xffff_fffd could be seen at the time of overflow)

Rather than depending on the overflow interrupt coming in at the same time as COUNT overflows (exactly), rely only on observed values of COUNT in _get_count, overflowing when it wraps around from a high value to a low one.

With this change, PLUS a second change so that it is possible to warp the RTC counter close to an overflow and test in 20ms instead of 3 days, there was no problem detected over 20000+ overflows. Before, a substantial fraction (much greater than 10%) of overflows failed.

If a user program carefully avoids calling any time-related function for over 1.5 days, it's possible that an overflow would get lost with the new formulation.

Fixes #5985

Change to common-hal/rtc/RTC.c for time warping (plus make rtc_old_count non-static):

 void common_hal_rtc_set_calibration(int calibration) {
+    common_hal_mcu_disable_interrupts();
+
+    RTC->MODE0.COUNT.reg = 0xffffff00;
+    rtc_old_count = 0;
+    do {
+        while ((RTC->MODE0.SYNCBUSY.reg & (RTC_MODE0_SYNCBUSY_COUNTSYNC | RTC_MODE0_SYNCBUSY_COUNT)) != 0) { }
+    }
+    while(RTC->MODE0.COUNT.reg < 0xffffff00);
+    common_hal_mcu_enable_interrupts();
+
+    mp_printf(&mp_plat_print, "Warping RTC in calibration setter count=%08x rtc_old_count=%08x\n", RTC->MODE0.COUNT.reg, rtc_old_count);

Test program:

import time
from rtc import RTC

i = 0
while True:
    RTC().calibration = 1 # Warps to ~16ms before overflow, with patch to RTC code
    t0 = time.monotonic_ns()
    et = t0 + 20_000_000 # 20ms
    while (t1 := time.monotonic_ns()) < et: pass
    i += 1
    print(f"{i:6d}: duration {t1-t0}")
    if t1-t0 > 200_000_000: break
    print()

@jepler jepler requested a review from dhalbert March 25, 2022 16:30
I instrumented RTC_Handler and determined that on SAMD51 it was possible
for the interrupt to be delivered well before the actual overflow of the
RTC COUNT register (e.g., a value as small as 0xffff_fffd could be seen
at the time of overflow)

Rather than depending on the overflow interrupt coming in at the same time
as COUNT overflows (exactly), rely only on observed values of COUNT in
_get_count, overflowing when it wraps around from a high value to a low
one.

With this change, PLUS a second change so that it is possible to warp
the RTC counter close to an overflow and test in 20ms instead of 3 days,
there was no problem detected over 20000+ overflows. Before, a substantial
fraction (much greater than 10%) of overflows failed.

Fixes adafruit#5985

Change to common-hal/rtc/RTC.c for time warping (plus make rtc_old_count non-static):
```patch
 void common_hal_rtc_set_calibration(int calibration) {
+
+    common_hal_mcu_disable_interrupts();
+
+        RTC->MODE0.COUNT.reg = 0xffffff00;
+        rtc_old_count = 0;
+        do {
+        while ((RTC->MODE0.SYNCBUSY.reg & (RTC_MODE0_SYNCBUSY_COUNTSYNC | RTC_MODE0_SYNCBUSY_COUNT)) != 0) { }
+    }
+    while(RTC->MODE0.COUNT.reg < 0xffffff00);
+    common_hal_mcu_enable_interrupts();
+
+    mp_printf(&mp_plat_print, "Warping RTC in calibration setter count=%08x rtc_old_count=%08x\n", RTC->MODE0.COUNT.reg, rtc_old_count);
```

Test program:
```python
import time
from rtc import RTC

i = 0
while True:
    RTC().calibration = 1 # Warps to ~16ms before overflow, with patch to RTC code
    t0 = time.monotonic_ns()
    et = t0 + 20_000_000 # 20ms
    while (t1 := time.monotonic_ns()) < et: pass
    i += 1
    print(f"{i:6d}: duration {t1-t0}")
    if t1-t0 > 200_000_000: break
    print()
```
Copy link
Collaborator

@dhalbert dhalbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your excellent sleuthing on this.

@jepler
Copy link
Member Author

jepler commented Mar 25, 2022

@jwfmcclain any chance you would be able to take a look at this possible fix for your issue?

@jwfmcclain
Copy link

@jepler, sure, but I think I need a some help finding the right build artifact (M4 Feather Express).

@dhalbert
Copy link
Collaborator

The artifacts are at the bottom of this page: https://github.com/adafruit/circuitpython/actions/runs/2041944567, but GitHub is having trouble right now. Here is a Feather M4 build; just unzip to get the UF2:
fm46202.zip

@dhalbert
Copy link
Collaborator

@jepler I tested this on a CPX, using your test code above, with the SYNCBUSY part commented out. I stopped it after the last runs below. Looks OK, if I understand correctly. Looks like the problem is not present on SAMD21.

Warping RTC in calibration setter count=ffffff06 rtc_old_count=00000000
 18725: duration 20141596

Warping RTC in calibration setter count=ffffff06 rtc_old_count=00000000
 18726: duration 20080574

Warping RTC in calibration setter count=ffffff06 rtc_old_count=00000000
 18727: duration 22644032

Warping RTC in calibration setter count=ffffff05 rtc_old_count=00000000
 18728: duration 20080560

Warping RTC in calibration setter count=ffffff06 rtc_old_count=00000000
 18729: duration 20141596

@jepler
Copy link
Member Author

jepler commented Mar 26, 2022

@dhalbert that's good, thanks for testing! While I think there's still the very unlikely problem in the case that the time isn't checked for a whole 3-day roll-over (or maybe just a 1.5 day half roll-over, not sure) that an overflow is lost, the fix of adding any calls to time.monotonic() in the program, even at extremely low frequency, would be a good user workaround.

@jepler
Copy link
Member Author

jepler commented Mar 31, 2022

@jwfmcclain It'd be great to have your feedback if you were able to test with this, but if not we'll merge it soon anyway.

@jwfmcclain
Copy link

Sorry, I didn't see Dan's message until this morning. Have both my M4 running the alpha code now

pi@serialpi:~ $ more /media/CIRCUITPY/boot_out.txt 
Adafruit CircuitPython 7.3.0-alpha.0-63-g372306411-dirty on 2022-03-25; Adafruit Feather M4 Express with samd51j19
Board ID:feather_m4_express
pi@serialpi:~ $ 

@dhalbert
Copy link
Collaborator

dhalbert commented Apr 5, 2022

@jwfmcclain Did you see any issues yet? Maybe no news is good news :) .

@dhalbert
Copy link
Collaborator

dhalbert commented Apr 5, 2022

Merging this to get it in. Will revisit if necessary.

@dhalbert dhalbert merged commit 47d3d0d into adafruit:main Apr 5, 2022
@jwfmcclain
Copy link

greps log files

@dhalbert -- no sign of the problem with the 7.3.0-alpha.0-63-g372306411-dirty build

Thanks

@jwfmcclain
Copy link

Still no signs of the problem while using 7.3.0-alpha.0-63-g372306411-dirty ... looking forward to 7.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

time.monotonic_ns sometimes jumps forward, and then returns the correct value on the next call
3 participants