Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a background serial queue and run RPC notifications on it #813

Merged

Conversation

joeljfischer
Copy link
Contributor

@joeljfischer joeljfischer commented Dec 7, 2017

Fixes #798

This PR is ready for review.

Risk

This PR makes no API changes.

This would cause differences in behavior on RPC returns. If developers make UI changes (or have code dependent on the main queue), runtime code errors could occur.

Testing Plan

n/a

Summary

Move RPC notifications from the main queue to a background serial queue.

Changelog

Enhancements
  • RPC notifications and responses are run on a background serial queue instead of the main queue.

Tasks Remaining:

  • Smoke tests in various apps
  • SDLProxy usage of dispatch_main

CLA

@NickFromAmazon
Copy link

As a followup to the evolution proposal discussion on this - I have a couple of comments/concerns regarding the implementation of this change:

  1. I disagree that this makes no API change - it's pretty common for iOS developers to use the main queue for application-wide synchronization (it's done in several places in this library,) and so will require synchronization by our clients where it wasn't previously necessary. This is potentially a breaking API change for some clients. This is why I preferred the configurable approach suggested by @kshala-ford, as it would allow clients to do this minor version update without requiring extra code changes.
  2. Internally, this library modifies mutable state as a result of these notifications (and it's very likely clients are calling back in to the library on the callback threads as well,) and as far as I can tell in many cases access isn't appropriately synchronized. SDLResponseDispatcher is a major concern: if multiple threads are seeing an inconsistent view of the backing data, there's potential for dropped requests or unhelpful crashes. I believe this is already an issue with some of the file operations, but this change would likely make things much worse.
  3. If I understand correctly, the main motivation behind this change is to improve responsiveness when clients are doing a lot of work on the main thread. With that in mind, it might be a good idea to perform work internally on a private background queue, as with this approach clients can still block internal execution by A) performing expensive operations synchronously in response to callbacks on this new queue, or B) performing work on the main queue, as internally it looks like all responses are still processed on main (as referenced above.)

@joeljfischer
Copy link
Contributor Author

joeljfischer commented Dec 11, 2017

Hi Nick, thanks for the feedback.

  1. According to the strict definition of "API change" by semantic versioning, this is not an API change, no public APIs were changed. However, runtime differences may occur, and I added that to the risk in the PR text. I am opposed to a configurable approach, as this would be confusing for most developers, and it opens up a lot of additional testing that must be done on the project compared to having a single type of responses. Additionally, this was not what was agreed to on the SDLC call and would need a new round of proposal and review.

  2. I don't share your concerns about SDLResponseDispatcher. There is no chance of multiple threads seeing an inconsistent view of the backing data. Multiple threads can add data to the SDLResponseDispatcher backing mutable stores, but there is no chance of them using a key such that one will be overwritten because the SDLLifecycleManager generates unique keys. There is then a delay between the storing and reading due to the request being sent to the head unit and awaiting the response. The only other question would be if reading and removing the data could cause an issue, but due to using a serial background queue, reads and deletes will be done on one queue.

  3. I'm not sure what internal work that could be on a private background queue you mean or how this would assist with the scenario you describe (developers doing significant work in serial due to RPC responses that could be parallelized). Could you provide an example?
    You are correct that SDLProxy processes responses on the main queue. I'm going to have to look deeper into that. All that proxy and protocol code should be rewritten but can't be until 6.0, unfortunately.

@NickFromAmazon
Copy link

  1. My argument is that the mechanism in which (public) notifications and responses are delivered to clients is part of the framework's public API, as making changes has the potential to break client code.
  2. It is my observation that 'adds' will happen from (at least) this serial queue, the operation queue in the SDLFileManager, the main thread (there's at least one request I see dispatched on main from within the SDLUploadFileOperation,) and potentially whatever thread clients are making requests from that are not made as a result of these callbacks (if not main). I can't comment on Apple's implementation of NSMapTable, but it is called as thread-unsafe, so concurrent writes are likely problematic (e.g. if there are hash collisions, or an add/delete triggers any kind of resize operation.)
    The logic to increment correlation IDs in SDLLifecycleManager doesn't appear to be thread-safe either, which could allow for duplicated IDs if accessed concurrently.
  3. To give a somewhat contrived example, a scenario like:
    3.1. Client's application connects to HMI and connection logic begins.
    3.2. Client receives initial SDLOnHMIStatus status on the rpcNotificationQueue, and initiates some expensive operation.
    3.3 PutFile request succeeds for app icon, but delivery is blocked as task is unable to execute on the rpcNotificationQueue. And so the app icon can't be set in a timely manner.
    If responses were handled internally without depending on the rpcNotificationQueue, clients would not be able to block execution of setup code within the framework.

1 can probably be mitigated through appropriate documentation, but 2 is still a concern. I agree that 3 is probably a larger task and beyond the scope of this PR, I just wanted to make sure it was called out as a potential way to further address what I understand to be the issue motivating this change.

@joeljfischer
Copy link
Contributor Author

joeljfischer commented Dec 12, 2017

@NickFromAmazon

  1. This is true, but has caused zero issues we've heard about in the over 1 year we've had that code in place (since v4.3). This is almost certainly due to the reasons I cited above. If you or anyone else is able to contrive a situation in which the code breaks, I would be more than happy to wrap it in some dispatch_sync calls and harden it, though that would certainly slow the whole system down.

  2. Technically that situation would not occur as you describe, but I see what you're saying. I'm still having trouble contriving a situation that would actually occur. For example, if the developer you describe starts that long-running synchronous operation on the rpcProcessingQueue, it would block additional responses and notifications from coming in, theoretically. Long running operations should obviously occur by dispatching to another queue, and I would hope other devs know that too. But that's the worst of it, it wouldn't prevent things like an app icon being set.

@NickFromAmazon
Copy link

  1. Bugs of this sort are going to be hard to reproduce and pinpoint, but anecdotally we've seen sporadic SEGV_MAPERR crash reports that appear to be due to unexpected de-allocation of SDL objects that could likely be related; there's just never been enough context to attempt a useful bug report. It's definitely not a common occurrence, the reason I'm bringing it up under this PR is that I'm worried this change would aggravate the issue.
    Any reason it would have to done via a dispatch_sync call? If sdl_sendRequest calls, and calls to run response handlers could be done on a shared (preferably private) serial queue, these should be able to dispatched asynchronously, and so likely with negligible impact to the rest of the system.

  2. Why wouldn't this occur? PutFile requests appear to be handled via the same mechanism as other requests, and so the responses that trigger SDLSetAppIcon won't happen if the queue on which responses are delivered is blocked. Try sleeping synchronously in response to a SDLDidChangeHMIStatus notification - at least with the (somewhat cobbled together) sdl_ios version and simulator I've tested with, this will delay the setup process for the duration of the sleep.

__block BOOL presented = NO;
dispatch_sync(dispatch_get_main_queue(), ^{
presented = (self.viewController.isViewLoaded && (self.viewController.view.window || self.viewController.isBeingPresented));
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling dispatch_sync(dispatch_get_main_queue() while on the main thread causes deadlock and the app crashes

Copy link
Contributor

@NicoleYarroch NicoleYarroch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screen touches are much more responsive. The only issue I had was the app crash noted in the comment.

@NicoleYarroch
Copy link
Contributor

I am getting UI API called from background thread Group warnings for the following lines in the SDLLockScreenViewController's sdl_layoutViews function

- (void)sdl_layoutViews {
    ...
    self.sdlIconImageView.image = [self.class sdl_imageWithName:@"sdl_logo_black"];
    self.sdlIconImageView.tintColor = iconColor;
    ...    
    self.lockedLabel.textColor = iconColor;
    ...
}

@joeljfischer
Copy link
Contributor Author

joeljfischer commented Dec 18, 2017

  1. I have not seen that issue, but I think it's important to note that even if we go to do a further auditing and rewrite of threading in sdl_ios, this current proposal is unlikely to make anything worse, we are just moving from the main queue to a background serial queue. Even if this isn't the full checkup and redo we eventually need, I don't see how this could possibly make things worse. Sending requests does get thrown onto an internal background serial queue once it hits the protocol layer.
    Writes / storing could be dispatch_async, but reads would have be dispatch_sync or use a completion handler for full thread-safety. The entire lifecycle manager should probably, as you said, work on a background queue, though I'd worry about too much queue switching and argue that the entire proxy / protocol layer should be re-done (but can't until 6.0 due to the files being public) with better threading done across all of them. I've been waiting to make those layers private before attempting such a re-do.

  2. The SDLUploadFileOperation doesn't check the responses before sending all the requests, so the image upload would succeed even though the responses were delayed. Again, developers have to know that they can't just run synchronous long-running ops on whatever thread willy-nilly....they know that right? 😅 The reason that's probably happening isn't the image upload, but the list files response is probably delayed. Your proposed internal queue wouldn't help this situation due to the internal queue would still have to wait for the response. The only way to mitigate this I could think of would be a concurrent queue, but I think we just have to hope developers don't do really dumb things. They can run long synchronous ops on the main queue as well and Apple doesn't stop them.

BTW, I updated it to make sure no response processing happens on the main queue.

@NickFromAmazon
Copy link

  1. I think that this change may make things worse because an additional background queue is being used to unsafely access mutable data. Prior to this change, reads/writes would primarily be done on main. With this change, it will be effectively impossible for clients to make requests into SDLManager on the same queue as responses are delivered, if these requests aren't sent as a direct result of an sdl_ios callback.
    Regarding the sync requirement, I'm still not sure I'm following - as long as reads and writes are done on the same serial queue, no additional blocking should be necessary - but perhaps I'm missing something by focusing to specifically on the ResponseDispatcher.
  2. Agreed it's not completely possible to protect clients from themselves; I'm just bringing this up here as that seems to be more or less the original motivation behind this PR.
    The reason my example allows clients to block setup is that the putFile response handler is called on the same serial queue that responses/notifications are delivered to clients. As you said, additional re-architecture would probably be necessary to get this right (and maybe this discussion should be moved to a new PR,) but if notifications/responses were delivered to clients on a different background serial queue, it should be possible to prevent clients from directly blocking internal execution.

I like the change to move the processing queue into the proxy, definitely seems like this should help with responsiveness.

@joeljfischer
Copy link
Contributor Author

I think that this change may make things worse because an additional background queue is being used to unsafely access mutable data. Prior to this change, reads/writes would primarily be done on main.

This is only true because callbacks were done on main.

With this change, it will be effectively impossible for clients to make requests into SDLManager on the same queue as responses are delivered, if these requests aren't sent as a direct result of an sdl_ios callback.

This is true, but IMO it would be as likely to have been called on main as on an arbitrary thread. While this technically reduces the odds of it happening all on main, we have tested extensively with a comprehensive SDL media app and have seen no issues with the new behavior.

Regarding the sync requirement, I'm still not sure I'm following - as long as reads and writes are done on the same serial queue, no additional blocking should be necessary - but perhaps I'm missing something by focusing to specifically on the ResponseDispatcher.

This is a very minor point in the whole discussion. For clarity, I'm saying that to write to a thread-safe dictionary would look like:

- (void)writeKey:(id<NSCopying>)key value:(id)value {
    dispatch_async(_dictQueue, ^{
        dict[key] = value;
    });
}

Read:

- (id)readValueForKey:(id<NSCopying>)key {
    __block id value = nil;
    dispatch_sync(_dictQueue, ^{
        value = dict[key];
    });

    return value;
}

Agreed it's not completely possible to protect clients from themselves; I'm just bringing this up here as that seems to be more or less the original motivation behind this PR.

The original motivation was due to CarWindow. Touch events come back very rapidly (before this PR) on the main queue. The dev then has to update their UI based on those touch events on the main queue, while CarWindow screenshots on the main queue. The intent of this PR was to move those touch events to the background, freeing up the main queue for that CarWindow work, and it does this job. The fact that it seems to improve other apps is a secondary, but welcome, change.

I'm not under the illusion that this is a final fix, but from experience with the library and testing, it does appear to be a band-aid.

@joeljfischer joeljfischer merged commit 6306ad8 into develop Dec 19, 2017
@joeljfischer joeljfischer deleted the feature/issue_798_rpc_notifications_serial_queue branch December 19, 2017 21:09
@NickFromAmazon
Copy link

It looks like this was merged, so I'll leave a final summary thoughts on this:

This is only true because callbacks were done on main.

Yes, this is the basis for my argument. The main queue is public, and intentionally or not, the previous implementation dictated how clients could use the library's API in a way that was A) (mostly) safe, B) did not require extra synchronization their side. Following this, neither of these things are possible.

it would be as likely to have been called on main as on an arbitrary thread.

Possibly. I can't speak for other clients, but in our interaction with the library we noticed that calling in on background threads introduced stability issues, and adjusted our call pattern accordingly.

a thread-safe dictionary

I was suggesting wrapping the entire interaction in an async block earlier in the call stack, so making the dictionary itself thread-safe wouldn't be necessary. The reason I'm stressing this point is that (assuming it wouldn't introduce other problems) it looks it may be possible to address my worst safety concerns quickly by dispatching the logic in sdl_sendRequest: on the new notification queue, without requiring any other threads to block.

The intent of this PR was to move those touch events to the background, freeing up the main queue for that CarWindow work

Pushing events to clients on a different queue so they don't interfere with other work is what I'm suggesting. The current approach mitigates the problem by shifting it on to the notification queue, which I think this is fine as a stop-gap (assuming it's done in a way that doesn't introduce other issues,) but a more comprehensive fix is possible. It sounds like we're in agreement on this point at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants