Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new overloads for GlobalExclusiveDeviceAccess.CommunicateWithDevice #377

Merged
merged 6 commits into from
Oct 9, 2024

Conversation

frobijn
Copy link
Contributor

@frobijn frobijn commented Sep 28, 2024

Description

Added overloads for GlobalExclusiveDeviceAccess.CommunicateWithDevice:

  • accepting NanoDeviceBase or IPort as argument, in addition to serialPort and NetworkDeviceInformation
  • accepting Func<Task> communicationAsync as argument, in addition to Action communication:

Now you can write:

GlobalExclusiveDeviceAccess.CommunicateWithDevice(
	device, 
	async () => 
	{ 
		await PingAsync (); 
	}
);

Motivation and Context

In nanoff the action to be protected is async code, but the GlobalExclusiveDeviceAccess did not have a method for that.

While trying to add the lock to the Device Explorer (e.g., for the Ping command): the code only knows about NanoDeviceBase, how to get from that to a serial port or network address should not be part of the Device Explorer code.

Types of changes

  • Improvement (non-breaking change that improves a feature, code or algorithm)
  • Bug fix (non-breaking change which fixes an issue with code or algorithm)
  • New feature (non-breaking change which adds functionality to code)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Config and build (change in the configuration and build system, has no impact on code or features)
  • Dependencies (update dependencies and changes associated, has no impact on code or features)
  • Unit Tests (add new Unit Test(s) or improved existing one(s), has no impact on code or features)
  • Documentation (changes or updates in the documentation, has no impact on code or features)

Checklist:

  • My code follows the code style of this project (only if there are changes in source code).
  • My changes require an update to the documentation (there are changes that require the docs website to be updated).
  • I have updated the documentation accordingly (the changes require an update on the docs in this repo).
  • I have read the CONTRIBUTING document.
  • I have tested everything locally and all new and existing tests passed (only if there are changes in source code).
  • I have added new tests to cover my changes.

@nfbot nfbot changed the title Extra overloads for GlobalExclusiveDeviceAccess.CommunicateWithDevice. Extra overloads for GlobalExclusiveDeviceAccess.CommunicateWithDevice Sep 28, 2024
Copy link

coderabbitai bot commented Sep 28, 2024

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (6)
  • nanoFramework.Tools.DebugLibrary.Shared/NFDevice/GlobalExclusiveDeviceAccess.cs is excluded by none and included by none
  • nanoFramework.Tools.DebugLibrary.Shared/PortSerial/DeviceWatcher.cs is excluded by none and included by none
  • nanoFramework.Tools.DebugLibrary.Shared/PortSerial/PortSerial.cs is excluded by none and included by none
  • nanoFramework.Tools.DebugLibrary.Shared/PortTcpIp/DeviceWatcher.cs is excluded by none and included by none
  • nanoFramework.Tools.DebugLibrary.Shared/PortTcpIp/PortTcpIp.cs is excluded by none and included by none
  • nanoFramework.Tools.DebugLibrary.Shared/WireProtocol/ConnectPortResult.cs is excluded by none and included by none

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Ellerbach
Copy link
Member

Help me understand the exact scenario here as I may have missed something: when connected to nanoff or the debug library, the device is already in exclusive usage. The serial port cannot be shared anyway, so it can only be open by one application.
For network devices it could be different but it could happen on different machines and here that code won't help at all.
So why adding this will prevent a usage by another one?

@frobijn
Copy link
Contributor Author

frobijn commented Sep 30, 2024

Help me understand the exact scenario here as I may have missed something. So why adding this will prevent a usage by another one?

GlobalExclusiveDeviceAccess was introduced in this PR.

GlobalExclusiveDeviceAccess provides an exclusive access mechanism via a system-wide mutex. It can be waited on, and you can set a timeout for gaining access. With the "regular" code that accesses the serial port a failure/exception can mean two things: (1) the port is in use by another application, or (2) the port cannot be used/there is a problem. If all nanoFramework tools use GlobalExclusiveDeviceAccess to get access to a port when they want to communicate with the port, then a failure is indeed: cannot use the port.

The GlobalExclusiveDeviceAccess is a feature I made for the test framework v3. You don't want a failed test because the Device Explorer is checking the port at the wrong moment, it is better that the test host waits until the Device Explorer is finished and then continues with running the tests. Or the other way around: if the Device Explorer cannot access a port for some time (e.g., because a test is running, or the Device Explorer of another Visual Studio instance), it does not show the device. With GlobalExclusiveDeviceAccess, it waits until the test has ended and then shows the device in the list. I think there are also scenarios where Visual Studio starts multiple test hosts, each running tests from a single test assembly, that want to access the same device; GlobalExclusiveDeviceAccess will protect against that.

The GlobalExclusiveDeviceAccess is of limited use if it is only used in the test framework, so I made a PR for it in the library that is used by all tools. I'll add the locks to nanoff (nanoframework/nanoFirmwareFlasher#290) and once this PR has been accepted to the Device Explorer as well. I don't know how to add it to the Visual Studio/nanoFramework debugger, but at least the protection will be part of most tools.

@josesimoes
Copy link
Member

I'm not sure this is being address in the most efficient (usable?) way...

Using GlobalExclusiveDeviceAccess to access the device ensures an exclusive access to it, no doubt about it.
Still, it requires to wrap all the existing code calling the debugger API with this, right?
That's not elegant and requires a serious refactoring.

Shouldn't we use instead a pattern that creates a global mutex in the NanoDevice constructor, for example, ensuring global lock.

@frobijn
Copy link
Contributor Author

frobijn commented Oct 1, 2024

No, the NanoDevice is not the correct place. The way it is used, it represents a device that has been recognized as a nanoDevice, but not one the software is actually communicating with. E.g., the Device Explorer uses the DeviceWatchers to discover devices. The result is a list of NanoDevice instances, but the Device Explorer itself is not yet doing anything with that.

The Device Explorer should obtain a lock if it wants to communicate with the device, e.g, just before doing a ping or obtaining the device information. I think that in most cases this coincides with the creation of a debugger engine. If you are looking for ways to integrate the lock with existing code, the Device.Connect() is a place to consider, as that is always called at the start of a period of communication with a device. But that would require some major refactoring, as the call to Device.Connect() is sometimes indirect (from DebugEngine.Connect()) and the current code is not designed for a graceful exit from that point. The idea is that a timeout should be handled differently than an error, but that is easier at a higher level in the code.

The lock can't be in Device.Connect() only, because it is quite "deep" in the code. E.g., in case of the test host, the first action is to find out if a device exists for a port, via PortSerialManager.AddDevice. Sometime during the execution of that method there is code that runs Device.Connect(), checks that the device is valid, and calls Device.Disconnect(). When the assemblies are deployed to the device, the underlying code executes Device.Connect() again. But you don't want to run the risk that there is another application (potentially another test host) that has obtained exclusive access to the port inbetween the Device.Disconnect() and subsequent Device.Connect() call. The test host should get exclusive access before checking whether there is a device connected to the port, and release the lock after the last test assembly has been executed.

Given the current code base, I still think the easiest way is to have the GlobalExclusiveDeviceAccess as a separate class. In the PR-version of nanoff it protects code before the esp32manager/nanodevice manager start doing anything and releases it after the manager has done its job. This is done at a high level in the code and with a timeout, so it is easy to terminate the program if the waiting timed out. In the Device Explorer (still to do) there are also very distinct lock/release moments in response to an action of the user. For the test framework it is also very clear when the lock should be applied or not. The only other use of the port (as far as I know) is a F5-debug session in Visual Studio, but that code still has too many secrets for me.

@josesimoes
Copy link
Member

Yes, NanoDevice constructor wasn't a good suggestion.
Ideally, this should be done "automatically" and for the API calls that actually need it.

Also, because there are operations that can't be interrupted and others that is OK to use.
For example, following a debug session start, the device can be pinged and be discovered from whatever instances there are. If I have 3 VS instances open, the device should be listed on all of them.
Now, starting a new debug session or erase its memory should not happen until that debug session is terminated.

On the original design of the debugger library this was (kind of) accomplished by the use of locks.

A concern I have with the current proposal is that only the code that adheres to this new global lock approach will be locked. Other callers can still call the various APIs without going through the lock and mess things up.

Please note that I see value on the proposition of a global lock. I'm concerned that it won't be that effective if the code doesn't enforce it "by itself" and by design. Relying on "disciplined" callers is not a good approach and is prone to failure.

What about "moving" this global lock to inside the library and make use of it in the APIs that can't be interrupted?
Or implement a "token" mechanism on which a caller has to create a token to communicate with a device and has to pass the token to use the API?

@frobijn
Copy link
Contributor Author

frobijn commented Oct 1, 2024

At first I thought it would be relatively easy to do, but then reality set in. Changes are:

GlobalExclusiveDeviceAccess

  • GlobalExclusiveDeviceAccess is now a token:
using var token = GlobalExclusiveDeviceAccess.TryGet (...);
if (token is not null)
{
    // This code has exclusive access
}
  • A subsequent call to GlobalExclusiveDeviceAccess.TryGet (before the token is disposed of) is never blocking and always (immediately) returns a token (that also has to be disposed of). It does not matter if the call is made on the same thread/task or on a different async task that is started with await or Task.Run.
  • The exclusive access ends when the main token and all subsequent tokens have been disposed.
  • If in the mean time GlobalExclusiveDeviceAccess.TryGet is called from a different thread, it will wait.

API integration

  • At the lowest level, the implementation of IPort.ConnectDevice, exclusive access is obtained using an infinite timeout. The argument is that there is no elegant way out, and if a timeout is required, that should be done at a higher level.
  • For a serial port there is another public method, OpenDevice. It also obtains exclusive access.
  • For a serial port, ConnectDevice calls OpenDevice multiple times in a WaitAndRetry loop. I presume that part of the goal of that loop was to compensate for an application locking the port. As the total wait time was quite long (10 sec) and access is now done differently, I've changed the wait times so that the total wait time is 0.5 sec. The code wil use the total wait time if a port is unresponsive, e.g., a port reserved for the Virtual Device while the Virtual Device is not running. I left the WaitAndRetry in the code, as it may help for devices that do not respond rightaway (is that a thing? No idea...)

Device watcher (serial ports)

  • The device watcher acquires exclusive access for each port before it investigates whether there is a device connected.
  • It first tries to get access with a timeout (1 sec). If that succeeds and the device watcher has just been started, the port is checked for a device and AllNewDevices event waits until that is completed.
  • If the first attempt fails, there probably is another application that uses the port. The AllNewDevices event will not wait for the port to be checked. The DeviceWatcher tries to get exclusive access a second time, now with an infinite timeout. If eventually the port is checked, it will be reported as added device, just like a device that is connected some time after the device watcher has been started.
  • A cancellation token is passed to all GlobalExclusiveDeviceAccess.TryGet that is cancelled if the device watcher is stopped. That stops the wait for exclusive access, but not the validation of a port - not because that is not possible, but because it is out of scope for this PR.

Implementation notes

For technical reasons the code now uses a Semaphore instead of a Mutex. That is also system-wide on Windows, but apparently not on other platforms. It is cross-process within the same Unix shell (tested that).

@frobijn
Copy link
Contributor Author

frobijn commented Oct 8, 2024

@josesimoes just to verify: is there anything you expect from me at this moment for this PR?

@josesimoes
Copy link
Member

@frobijn thanks for checking, I'll get back to this PR today and resume the review with your recent changes.

Copy link
Member

@josesimoes josesimoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frobijn minor code layout changes to improve readability and maintainability.
One last question about a CancellationTokenSource on a separate comment.
Tested it with several devices and plugging, unplugging and it works quite smoothly. Responsiveness to devices arrival/departure has increased.
And discovery process also feels much smooth.
Great work on this!! 💯

@josesimoes
Copy link
Member

The main reason for the retry mechanism is that some devices (like ESP32) can have quite different boot times. Because of the time it takes to bootloader to load the CLR. And then the CLR going through type resolution stage and having Wire Protocol responsive. Others have the internal USB CDC hardware that starts late in the boot process.
On general devices, there could be other applications opening the serial port for some reason. Because of the concurrency we should behave nicely and retry, otherwise we would risk reporting a false negative which is absolutely inconvenient, causes developer frustration and developers can perceive this as a malfunction of the discovery process.
Moreover, when VS requests a boot at debug session start and only the CLR restarts the device can take some time to become responsive.

All this calls for the retry workflow that you see here. Otherwise, it would be a simple: open port -> fails ? -> I'm done with this device.

@frobijn
Copy link
Contributor Author

frobijn commented Oct 9, 2024

The main reason for the retry mechanism is that some devices (like ESP32) can have quite different boot times. Because of the time it takes to bootloader to load the CLR. And then the CLR going through type resolution stage and having Wire Protocol responsive. Others have the internal USB CDC hardware that starts late in the boot process. On general devices, there could be other applications opening the serial port for some reason. Because of the concurrency we should behave nicely and retry, otherwise we would risk reporting a false negative which is absolutely inconvenient, causes developer frustration and developers can perceive this as a malfunction of the discovery process. Moreover, when VS requests a boot at debug session start and only the CLR restarts the device can take some time to become responsive.

All this calls for the retry workflow that you see here. Otherwise, it would be a simple: open port -> fails ? -> I'm done with this device.

Aha, then let's keep the retry-workflow in the code.

Copy link
Member

@josesimoes josesimoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@josesimoes josesimoes changed the title Extra overloads for GlobalExclusiveDeviceAccess.CommunicateWithDevice Add new overloads for GlobalExclusiveDeviceAccess.CommunicateWithDevice Oct 9, 2024
@josesimoes josesimoes merged commit d84256c into nanoframework:main Oct 9, 2024
4 checks passed
@nfbot
Copy link
Member

nfbot commented Oct 9, 2024

@frobijn thank you again for your contribution! 🙏😄

.NET nanoFramework is all about community involvement, and no contribution is too small.
We would like to invite you to join the project's Contributors list.

Please edit it and add an entry with your GitHub username in the appropriate location (names are sorted alphabetically):

  <tr>
    <td><img src="https://github.com/frobijn.png?size=50" height="50" width="50" ></td>
    <td><a href="https://github.com/frobijn">Frank Robijn</a></td>
  </tr>

(Feel free to adjust your name if it's not correct)

@frobijn frobijn deleted the GlobalExclusiveDeviceAccess branch October 9, 2024 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants