Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vnet] windows service stub #50468

Merged
merged 3 commits into from
Jan 17, 2025
Merged

[vnet] windows service stub #50468

merged 3 commits into from
Jan 17, 2025

Conversation

nklaassen
Copy link
Contributor

@nklaassen nklaassen commented Dec 20, 2024

This commit adds a Windows service for VNet. Currently all the service does is create a TUN interface, it doesn't yet actually handle any networking or establish any IPC, the rest will come in later PRs.

Installation of the service will eventually be handled by the Connect installer.

If you want to test this out on a Windows machine/VM you will need to:

  1. install wintun.dll in the same directory as the tsh exe
  2. install tsh as a Windows service named TeleportVNet service, from an admistrative command prompt:
sc.exe create TeleportVNet binPath="<path-to-tsh> vnet-service" start=demand
# add (A;;RPWPLC;;;AU) to the default service DACL so authenticated users can start/stop/query the service
sc.exe sdset TeleportVNet "D:(A;;CCLCSWRPWPDTLOCRRC;;;SY)(A;;CCDCLCSWRPWPDTLOCRSDRCWDWO;;;BA)(A;;CCLCSWLOCRRC;;;IU)(A;;CCLCSWLOCRRC;;;SU)(A;;RPWPLC;;;AU)S:(AU;FA;CCDCLCSWRPWPDTLOCRSDRCWDWO;;;WD)"
  1. you should now be able to run tsh vnet -d
  2. the service will start sc.exe query TeleportVNet
  3. A TUN interface is created netsh interface show interface
  4. The service writes logs to logs.txt in the directory where tsh is installed (this is temporary until I find a better place for logs).
  5. The service stops and the interface is cleaned up when the user process exits.

@nklaassen nklaassen added the no-changelog Indicates that a PR does not require a changelog entry label Dec 20, 2024
@nklaassen
Copy link
Contributor Author

cc @ravicious, I'm once again refactoring a bunch of stuff

@github-actions github-actions bot added size/md tsh tsh - Teleport's command line tool for logging into nodes running Teleport. labels Dec 20, 2024
@ravicious ravicious self-requested a review December 20, 2024 08:35
@nklaassen nklaassen marked this pull request as draft December 28, 2024 01:18
@nklaassen nklaassen force-pushed the nklaassen/windows-service branch from b97bcdd to 7295ac0 Compare January 7, 2025 01:49
@nklaassen nklaassen marked this pull request as ready for review January 7, 2025 01:53
@nklaassen
Copy link
Contributor Author

@ravicious @atburke @codingllama this is ready for review

Copy link
Member

@ravicious ravicious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through the code and it seems to work fine. I'll give it a go tomorrow in a VM and I'll check if the macOS version still works.

lib/vnet/escalate_windows.go Outdated Show resolved Hide resolved
lib/vnet/escalate_windows.go Outdated Show resolved Hide resolved
lib/vnet/escalate_windows.go Outdated Show resolved Hide resolved
Comment on lines 217 to 222
// configureServicePermissions sets the security descriptor DACL on the service
// such that the user who installed the service (identified by userSIDStr) is
// allowed to start, stop, and query the service.
func configureServicePermissions(service *mgr.Service, userSIDStr string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this assume that only a single user on the device can install and use the service?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed everything about installing the service and configuration permissions for now, will solve this later in a way that multiple users can install/use the service

Comment on lines 71 to 74
case <-pm.closed:
// Errors are expected after the process manager has been closed,
// usually due to context cancellation.
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What problems does this solve compared to the version on master? I assume it has to do with what errors are returned by Wait, but by just looking at this file alone is hard to tell what changed in the whole process orchestration compared to master.

Do we just want Close() to not make Wait() return a "context cancelled" error? I know we've changed how we approach this specific error a couple of times, so I'd just want to have some clarity on how to approach it in the future. ;)

teleport/lib/vnet/run.go

Lines 177 to 218 in 47bf4ca

func newProcessManager() (*ProcessManager, context.Context) {
ctx, cancel := context.WithCancel(context.Background())
g, ctx := errgroup.WithContext(ctx)
return &ProcessManager{
g: g,
cancel: cancel,
}, ctx
}
// ProcessManager handles background tasks needed to run VNet.
// Its semantics are similar to an error group with a context, but it cancels the context whenever
// any task returns prematurely, that is, a task exits while the context was not canceled.
type ProcessManager struct {
g *errgroup.Group
cancel context.CancelFunc
}
// AddCriticalBackgroundTask adds a function to the error group. [task] is expected to block until
// the context returned by [newProcessManager] gets canceled. The context gets canceled either by
// calling Close on [ProcessManager] or if any task returns.
func (pm *ProcessManager) AddCriticalBackgroundTask(name string, task func() error) {
pm.g.Go(func() error {
err := task()
if err == nil {
// Make sure to always return an error so that the errgroup context is canceled.
err = fmt.Errorf("critical task %q exited prematurely", name)
}
return trace.Wrap(err)
})
}
// Wait blocks and waits for the background tasks to finish, which typically happens when another
// goroutine calls Close on the process manager.
func (pm *ProcessManager) Wait() error {
return trace.Wrap(pm.g.Wait())
}
// Close stops any active background tasks by canceling the underlying context.
func (pm *ProcessManager) Close() {
pm.cancel()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't feel strongly about this, for some reason while writing i wanted it not to return an error after the processmanager was closed, if you have another opinion i can revert

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No opinion, just wanted to understand the reasoning.

tool/tsh/common/tsh.go Outdated Show resolved Hide resolved
Comment on lines 42 to 44
newVnetCommand(app),
newVnetInstallServiceCommand(app),
newVnetUninstallServiceCommand(app),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connect will have to perform its own equivalent of those commands, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, the vnet service command will run when tshd re-execs itself, it should be handled by tsh fine

@ravicious ravicious self-requested a review January 8, 2025 15:43
return trace.Wrap(err, "querying admin service")
} else {
if status.State != svc.Running && status.State != svc.StartPending {
return trace.Errorf("service stopped running prematurely, status: %v", status)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsh vnet works fine on Windows. The macOS daemon works too.

At first I copied the wrong wintun.dll. I copied the arm64 version but my VM is actually amd64. This resulted in tsh vnet returning this:

$ ./build/tsh.exe vnet
VNet is ready.
ERROR: running admin process in the background
        service stopped running prematurely, status: {1 0 0 0 0 0 0}

Is it possible to convert the status to some kind of a text representation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah switched to %+v which gives a better view

Copy link
Contributor

@codingllama codingllama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is pretty dense at times (Windows services are nobody's specialty) and holds a large number of related-but-seemingly unnecessary changes (800+ lines in 25 files is no joke). I would advise, in the future, that you split PRs like this into smaller, more focused parts.

Given the number of comments it's probably not worth splitting now, apart from tool/tsh changes which I do think could still be reasonably split.

lib/teleterm/vnet/service.go Outdated Show resolved Hide resolved
lib/vnet/admin_process_darwin.go Outdated Show resolved Hide resolved
lib/vnet/admin_process_darwin.go Outdated Show resolved Hide resolved
lib/vnet/admin_process_windows.go Outdated Show resolved Hide resolved
lib/vnet/admin_process_windows.go Outdated Show resolved Hide resolved
lib/vnet/process_manager.go Show resolved Hide resolved
lib/vnet/unsupported_os.go Outdated Show resolved Hide resolved
lib/vnet/unsupported_os.go Outdated Show resolved Hide resolved
tool/tsh/common/tsh.go Outdated Show resolved Hide resolved
tool/tsh/common/tsh.go Outdated Show resolved Hide resolved
@nklaassen nklaassen force-pushed the nklaassen/windows-service branch from 7295ac0 to 83cda08 Compare January 9, 2025 22:36
@nklaassen nklaassen marked this pull request as draft January 10, 2025 02:22
@nklaassen
Copy link
Contributor Author

Back to draft for now, i'm taking a stab at chunking this down a bit

@nklaassen nklaassen force-pushed the nklaassen/windows-service branch from 83cda08 to 36bdd77 Compare January 10, 2025 02:23
@nklaassen nklaassen changed the base branch from master to nklaassen/vnet-tsh January 10, 2025 02:26
@nklaassen
Copy link
Contributor Author

Moved the tsh changes in #50935

Base automatically changed from nklaassen/vnet-tsh to master January 14, 2025 01:31
@nklaassen nklaassen force-pushed the nklaassen/windows-service branch 2 times, most recently from 18bf244 to b47a65a Compare January 14, 2025 02:05
@nklaassen nklaassen force-pushed the nklaassen/windows-service branch 4 times, most recently from 9c88cae to 1984e63 Compare January 16, 2025 02:09
@nklaassen nklaassen force-pushed the nklaassen/windows-service branch from 1984e63 to a6f09c8 Compare January 16, 2025 02:15
@nklaassen nklaassen marked this pull request as ready for review January 16, 2025 02:33
@github-actions github-actions bot requested a review from fheinecke January 16, 2025 02:33
@github-actions github-actions bot requested a review from probakowski January 16, 2025 02:33
@nklaassen
Copy link
Contributor Author

@codingllama @ravicious this is ready for review again, i did my best to remove unrelated changes and everything about service installation, it's about half the size now

@nklaassen nklaassen changed the title [vnet] install and run windows service [vnet] windows service stub Jan 16, 2025
Copy link
Contributor

@codingllama codingllama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for refactoring!

lib/vnet/admin_process_darwin.go Outdated Show resolved Hide resolved
lib/vnet/escalate_daemon_darwin.go Show resolved Hide resolved
tool/tsh/common/tsh.go Show resolved Hide resolved
lib/vnet/process_manager.go Show resolved Hide resolved
// runPlatformUserProcess creates a network stack for VNet and runs it in the
// background. To do this, it also needs to launch an admin process in the
// background. It returns a [ProcessManager] which controls the lifecycle of
// both background tasks.
//
// The caller is expected to call Close on the process manager to close the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little strange that the caller has to close the ProcessManager on error, but not on success - typically code goes the other way. Should we make sure the ProcessManager on errors here instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this ended up weird when i kept this part here and moved process manager creation into the platform-specific functions. Now i've moved closing the ProcessManager down into the platform functions.

Copy link
Member

@ravicious ravicious Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that it's kind of weird, but I believe that's mostly because runPlatformUserProcess in user_process_darwin.go does a bunch of other operations that 1) that can fail independently of ProcessManager 2) depend on ProcessManager to start its tasks. And also 3) the tasks managed by ProcessManager are expected to continue running after runPlatformUserProcess exits.

runPlatformUserProcess creates a "server" (through a task handled by ProcessManager) as well as a "client" which then contacts the server. If the client doesn't get a response from the server, we have no user for the server, hence we stop the server.

Given those properties, I'm not sure if there's a better way to manage that? Looking at some other prior art would be helpful, I do have to admit that it's hard to reason about this stuff!

lib/vnet/escalate_windows.go Outdated Show resolved Hide resolved
@nklaassen nklaassen requested a review from codingllama January 16, 2025 20:35
lib/vnet/admin_process_darwin.go Outdated Show resolved Hide resolved
@nklaassen nklaassen added this pull request to the merge queue Jan 17, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 17, 2025
@nklaassen nklaassen added this pull request to the merge queue Jan 17, 2025
Merged via the queue into master with commit 69188d4 Jan 17, 2025
41 checks passed
@nklaassen nklaassen deleted the nklaassen/windows-service branch January 17, 2025 18:17
nklaassen added a commit that referenced this pull request Jan 17, 2025
mvbrock pushed a commit that referenced this pull request Jan 18, 2025
* [vnet] Windows service

* address alan's comments

* fix typo in comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog Indicates that a PR does not require a changelog entry size/md size/sm tsh tsh - Teleport's command line tool for logging into nodes running Teleport.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants