Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure wakelocks are released on more exception paths. #1738

Closed

Conversation

15characterlimi
Copy link

There are a lot of places around xDrip+ where wakelocks are acquired but not released until the (generally inconsistently picked) timeout runs out. Many of those are deliberate (although probably not a good idea), but others occur by accident, on exception paths. This commit fixes a subset of the latter.

  • There were a few cases where no try/finally block was used but there were multiple return statements that were each accompanied by a release of the wakelock. This is needlessly complex and error prone. I've replaced these with a try/finally block; note that even in case of a return statement being encountered in the try block, the finally block will still run.

  • Many of these occur on background threads; when uncaught exceptions on background threads kill the app, this is okay - but these many threads should not hard code assumptions about the UncaughtExceptionHandler, which is a global setting. This commits introduces a helper class WakeLockThread that takes care of the acquisition/release of the wakelock, and ensures that the thread has a name (which can help during debugging); in some cases, this also allows dropping a try/finally block to avoid indentation in the source code. The presence of the new class should hopefully encourage more mindful handling of wakelocks in future code.

  • There were also a handful of cases where there was a try/finally block, but the wakelock acquisition occurred a few statements before the try block; I've moved the wakelock acquisition to just before try block in order to make sure that it is always released.

Unrelatedly, I've also fixed a handful of needless constructions of Runnable objects for invocations of the form new Thread(new Runnable() { public void run() { ... } } which can instead just override Thread.run() instead (same behavior, but one less object).

I ran all Robolectric unit tests and confirmed that they still pass.

Potential work for follow-up commits:

  • Potentially address the two TODOs and one FIXME that I've left in the code.
  • Add convenience API to specify (TimeUnit timeUnit, long duration) instead
    of (long millis) for more readable / less error-prone code.
  • Add an AbstractWakeLockService class that takes care of acquiring/releasing
    a wakelock during onStartCommand(), since xDrip+ has multiple independent
    implementations of this behavior, some of which were buggy. That said,
    the app will probably be killed during onStartCommand() only in extreme
    cases, so acquiring the wakelock might be unnecessary in the first place.

There are a lot of places around xDrip+ where wakelocks are acquired but not released until the (generally inconsistently picked) timeout runs out. Many of those are deliberate (although probably not a good idea), but others occur by accident, on exception paths. This commit fixes a subset of the latter.

 * There were a few cases where no try/finally block was used but there were multiple return statements that were each accompanied by a release of the wakelock. This is needlessly complex and error prone. I've replaced these with a try/finally block; note that even in case of a return statement being encountered in the try block, the finally block will still run.

 * Many of these occur on background threads; when uncaught exceptions on background threads kill the app, this is okay - but these many threads should not hard code assumptions about the UncaughtExceptionHandler, which is a global setting. This commits introduces a helper class WakeLockThread that takes care of the acquisition/release of the wakelock, and ensures that the thread has a name (which can help during debugging); in some cases, this also allows dropping a try/finally block to avoid indentation in the source code. The presence of the new class should hopefully encourage more mindful handling of wakelocks in future code.

 * There were also a handful of cases where there was a try/finally block, but the wakelock acquisition occurred a few statements before the try block; I've moved the wakelock acquisition to just before try block in order to make sure that it is always released.

Unrelatedly, I've also fixed a handful of needless constructions of Runnable objects for invocations of the form new Thread(new Runnable() { public void run() { ... } } which can instead just override Thread.run() instead (same behavior, but one less object).

I ran all Robolectric unit tests and confirmed that they still pass.

Potential work for follow-up commits:
 - Potentially address the two TODOs and one FIXME that I've left in the code.
 - Add convenience API to specify (TimeUnit timeUnit, long duration) instead
   of (long millis) for more readable / less error-prone code.
 - Add an AbstractWakeLockService class that takes care of acquiring/releasing
   a wakelock during onStartCommand(), since xDrip+ has multiple independent
   implementations of this behavior, some of which were buggy. That said,
   the app will probably be killed during onStartCommand() only in extreme
   cases, so acquiring the wakelock might be unnecessary in the first place.
@tolot27 tolot27 added the code:quality code and repository related label May 25, 2021
@tolot27
Copy link
Collaborator

tolot27 commented May 25, 2021

@15characterlimi Thanks for catching this topic. It will take a wile to review it. What it makes harder is that there is just one commit with many changed files. But it looks like the same change is applied to several files.
I know we have a huge code duplication between app and wear. Shouldn't the same changes be applied to wear sources as well?

@tolot27 tolot27 self-requested a review May 25, 2021 07:10
@15characterlimi
Copy link
Author

Thanks for catching this topic. It will take a wile to review it. What it makes harder is that there is just one commit with many changed files.

Would you prefer if I split this PR up into different "kinds" of changes, ie.

  • Introduce and use WakeLockThread
  • Replace blocks that release the WakeLock separately on multiple return statements -> a single try/finally block
  • ...

?

I know we have a huge code duplication between app and wear. Shouldn't the same changes be applied to wear sources as well?

I only looked through usages of JoH.getWakeLock(). If that class/method is also duplicated for Wear, then that would explain it. Let me look into this some time outside of work time (I will have limited time to engage with this during the week, please bear with me).

If this is all duplicated in Wear then the #1 goal should be to reduce/remove this duplication so that all future work doesn't need to be doubled-up. Would you agree?

@tolot27
Copy link
Collaborator

tolot27 commented May 25, 2021

@15characterlimi No hurry. We all have limited time. BTW: Issue #1522 catches many of the code review aspects and the app/wear code duplication is also mentioned in #1733. The problem is that we cannot touch to many files at once.

As more and more developers contribute to xDrip we have to get more organized. I'll propose structured collaboration, soon.

Regarding to PRs affecting a larger number of files: I recommend creating commits as small as possible, atomic at best, and provide descriptive commit messages. That helps during review and allows cherry-picking.

@jamorham
Copy link
Collaborator

This PR isn't unreasonable in itself and looks like it has good practice behind it, but the main issues I see go like this:

  • Is this solving a particular problem? Does an audit of operational wakelocks of the app running show that they are not being released in good time?
  • Have the modified code paths been tested operationally, for example if NFC reading is modified has that been tested? Wakelock debugging can be tricky because the device can behave differently depending on whether they want to doze or not and how strictly things are being enforced. Some items are tune-able at runtime and different manufacturers tweak these settings in potentially annoying ways. A quick read of the PR doesn't look like it changes any behavior, but actual testing is important to validate that.

@15characterlimi
Copy link
Author

15characterlimi commented May 26, 2021

Is this solving a particular problem? Does an audit of operational wakelocks of the app running show that they are not being released in good time?

The main goal of this commit is to set a precedent for good wakelock hygiene and to make it easy to ensure wakelocks are released in a timely and deterministic manner. Specifically:

  • Releasing wakelocks in a deterministic manner (after completion of some action rather than after a set number of seconds) makes the code behave more deterministically, which makes it more likely that any bugs reproduce reliably and their presence and root causes identified. While the current practice of holding on to wakelocks longer than needed is unlikely to trigger a serious bug, it's hard to know this for sure because any bug triggered in this manner would be hard to reproduce and therefore would have a high chance of remaining undetected, even if its impact was severe (for example by obfuscating the fact that a wakelock is needed somewhere else).
  • Releasing wakelocks once they are no longer needed saves battery. Again the impact per user is probably small, but (a) there are lots of users, and (b) if there are cases where the impact is not small then we'd be unlikely to find out because of (again) the nondeterministic and hard-to-reproduce causal chain.

In summary, the impact of each of the places touched by this commit is probably small, but (a) this commit fixes a lot of them, and (b) we don't know for sure that it isn't large because it's so difficult to track battery use back to a particular piece of sloppy coding.

The idioms introduced by this commit aim to make the code more maintainable in the long run by setting precedent and by making it easier to write correct code. For example,

  • Subclasses of WakeLockThread can't forget to give up the wakelock.
  • Code snippets inside a try/finally block no longer need to remember to release the wakelock on every single return statement, which is error prone (example: DexCollectionService.onServicesDiscoveredWithWakeLockHeld())

Have the modified code paths been tested operationally, for example if NFC reading is modified has that been tested?

Not yet.


I'd like to take a step back and ask you about the general engineering philosophy, because I think this will determine whether me trying to contribute to this project will deliver positive value vs. just be annoying for both of us.

My personal impression is that xDrip+ has a lot of features, but the code supporting those features has become quite brittle and hard to maintain. Examples:

  • There's lots of code duplication, e.g. between Wear and non-Wear code, which doubles the effort of any commits that touch this code.
  • The app still targets Android Marshmallow, which means that:
    • The app would not be accepted on Play Store (I gather that this is a non-goal, for liability reasons)
    • Modern tooling such as the error-prone plugin (static analysis that detects bugs) cannot be used.
    • We're stuck on the JDK 8 toolchain from 2014 - Java 17 LTS is coming out this September
    • Android exhibits a large number of compatibility behaviors; the behavior changes since then have generally been made in the interest of the user (battery saving, privacy, bug fixes, ....) so the user will not get those benefits.
  • There is lots of dead code. For example, CipherUtils.java contains a hard-coded key (which seems like an insecure practice?), but also appears to be nearly completely unused.
  • Sloppy code that is almost correct is all over the place - such as wakelocks that aren't deterministically released or that are even kept around for a seemingly arbitrary amount of time.
  • Test coverage is nowhere near enough to give confidence in code changes (examples: neither this PR nor Fix MMOLL_TO_MGDL conversion constant. #1737 touches any existing tests, doesn't break any existing tests, yet still people are very worried about it breaking stuff)
  • There are lots of "haunted graveyards" that no one dares to touch. For example
    • The comment on the targetSdkVersion vaguely says "increasing target SDK version can cause compatibility issues with Android 7+", rather than being specific about what needs to change in order to target a modern version of Android.
    • No one knows where the constant value MMOLL_TO_MGDL comes from yet still people are afraid to change it to a very similar value that breaks no tests and has a justification.
  • Most commits deal with adding new features, rather improving the maintainability of the existing code. PRs that try to (such as this one here) don't feel very welcomed).

My personal perception is that xDrip+ is stuck in a local optimum that it can't get out of because the code is so brittle that no one dares to make any changes, beyond adding even more features. My personal perception is that xDrip+ has for years been stuck in a place where most effort was invested in more and more features, and very little effort was invested in keeping the existing code healthy (for example, by deleting dead code, refactoring sloppy or poorly abstracted code, or targeting recent versions of Android). My personal perception is that what xDrip+ needs most is not new features, but a higher code quality through static analysis (error-prone), more comprehensive unit tests, deduplication of code that was previously duplicated (Wear), ... Getting there will likely result in less stability in the short term as some things will invariably break as things get refactored, but it will result in much more stability and code quality in the long term. I personally value code quality and maintainability at least equally with feature set, and I think that this means that I could provide value.

Does this match your philosophy of what you would like xDrip+s future to be? If not, then no problem, it's your project. But then I think it'd be good to know, because I think that my attempts to provide value would then just result in frustration for both of us. I used to maintain a codebase that had a lot more users than xDrip+ (actually, xDrip+ was one of those users...) and I'm definitely not a cowboy when it comes to making code changes. At the same time, I'm a strong believer in "No haunted graveyards" (example out of many excellent talks/articles). In my experience, subtle bugs that lurk below the surface can be worse over their lifetime than hard bugs because the hard bugs are found quickly and therefore short lived. From your and others' reactions to my bait in form of two pull requests, it seems to me that this approach to engineering might not be welcome here. If that's the case, then so be it. But it'd be good to know so we can each go our separate ways.

WDYT?

@jamorham
Copy link
Collaborator

The main issue is integration testing. I don't believe there is really any automated way to unit test the interactions with the framework that occur over the (fragmented) Android ecosystem and xDrip commonly is interacting with remote devices which have their own quirks as well.

This means that complete testing needs to be done with real handsets and real devices in my opinion. The existing code has been tested like this and so if its going to be changed then I have to weigh up the pros and cons of doing so and factor in the level of testing that has occurred.

The changes proposed in this PR are relatively minor and I'm not suggesting that there are errors in it (although I haven't meticulously reviewed it either) but taking a strict engineering view when asking the question "does this break anything" the answer is that it hasn't been tested so we don't know for sure. This is why I asked that question.

When someone produces changes for a particular part of the system which they are using every day, it is usually to address a specific need, like something is broken or a new feature is added and there is a greater level of confidence that it has been operationally tested (at least on the handsets they are using) - In my case I'm typically testing over a range of handsets with various chipsets and android versions and often live with multiple different users before merging the change.

Last time I did an audit of wakelocks, what I found was that the only dangling one consuming anything significant was inside of google play services. This is why xDrip patches this at runtime to resolve that issue. The fact that this is necessary when it is google who are hounding developers to reduce background cpu usage tells you a lot about the chaotic nature of working with android and their engineering practices.

One of the reasons for not targeting higher sdk versions is because of restrictions put in place by doing so that can break app behavior. In some situations this can be worked around with code changes and in others there is no replacement functionality (eg calling hidden api's to resolve framework bugs)

Often google will introduce a replacement for something they are removing from a targetsdk level, provide documentation for it that is wrong (or manufacturers will integrate it incorrectly - hard to tell which) only to abandon it on the next major release. Such instability in the framework doesn't give so much incentive to adopt the new mechanisms while the old ones are still working. Eventually changes will be required as backward compatibility wanes in future android versions, a good example of this is package targeted broadcasts which are mandatory whichever android version is targeted.

I do want to raise the minimum sdk level to android 4.4 (and in time 5.0) but last time I increased it there were complaints from users still using those handsets which is why that was reverted. I'm thinking enough time has passed now that such a change should be much safer to make.

@15characterlimi
Copy link
Author

15characterlimi commented May 27, 2021

Thanks, that makes sense. There may be some common ground around better test coverage and/or refactoring code to be more testable that we can agree on. Let me think about this over the weekend and get back to you; perhaps I can think of some test coverage improvements that will make up for any error budget spent in the short term on refactorings/fixes such as in this PR here. Obviously a full integration test will not be possible without having all the other respective systems to integrate with (I don't have much other than my phone and a Dexcom G6, so an integration test in the general case is impossible for me), but it might possible in principle to extract an API boundary / abstraction in some places and then have unit tests that operate at that boundary. I haven't checked, but I wonder for example whether there are sufficient tests at the layer of the HTTP server; I've also been thinking of improving the UI of the appwidget (some extra lines could make it more readable IMO), and while I'm at it perhaps adding some screenshot-based integration tests. We'll see - I'll look into it.

I'll also have to address @tolot27 's comments - I'll have to check for duplicate code that also needs to change, and might want to split this commit up into multiple that separately tackle different parts of the issue.

Please hold on and I'll get back to you, hopefully by the end of the upcoming weekend (I also have a bunch of other things to deal with this weekend so can't 100% promise). I'll get back to you eventually.

@tolot27
Copy link
Collaborator

tolot27 commented Jun 24, 2021

I'm not sure whether this PR would fix the NPEs which can be seen during unit tests, i. e. https://github.com/jamorham/xDrip-plus/pull/115/checks?check_run_id=2908921032#step:5:793:

    Exception in thread "Thread-75" java.lang.NullPointerException
    	at org.robolectric.shadows.ShadowPowerManager.newWakeLock(ShadowPowerManager.java:32)
    	at android.os.PowerManager.newWakeLock(PowerManager.java)
    	at com.eveningoutpost.dexdrip.Models.JoH.getWakeLock(JoH.java:831)
    	at com.eveningoutpost.dexdrip.UtilityModels.BgSendQueue.handleNewBgReading(BgSendQueue.java:115)
    	at com.eveningoutpost.dexdrip.Models.BgReading.notifyAndSync(BgReading.java:1240)
    	at com.eveningoutpost.dexdrip.Models.BgReading.lambda$bgReadingInsertFromG5$0(BgReading.java:1104)
    	at com.eveningoutpost.dexdrip.UtilityModels.Inevitable$Task.poll(Inevitable.java:114)
    	at com.eveningoutpost.dexdrip.UtilityModels.Inevitable.lambda$task$0(Inevitable.java:60)
    	at java.lang.Thread.run(Thread.java:748)

@15characterlimi
Copy link
Author

I'm not sure whether this PR would fix the NPEs which can be seen during unit tests

Probably not, the code still builds on top of JoH.getWakeLock()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code:quality code and repository related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants