UI: Change Run Job availability based on ACLs #5944

backspace · 2019-07-09T20:24:05Z

This builds on API changes in #6017 and #6021 to conditionally turn off the
“Run Job” button based on the current token’s capabilities, or the capabilities
of the anonymous policy if no token is present.

If you try to visit the job-run route directly, it redirects to the job list.

Here’s a GIF to get a sense:

This isn’t a valid solution, but it gets closer to one by ensuring the token is loaded when the application boots. Then I can add another step to load and parse the token’s policies.

The endpoint doesn’t actually support this 😳

This should have been part of dc98403.

It’s cut off by the edge of the viewport for now!

These unrelated tests were failing because they assumed that the first 404 was stable.

Here’s another similar instance.

My placeholder test was incorrect.

The “greatest number of matched characters” part is still forthcoming.

It’s too bad reduce is so unwieldy in Javascript… maybe this would be better a for loop ☹️

This is written assuming #6017 is merged as-is. It’s trivial to change the property name if needed!

This regex shows what’s permitted: https://github.com/hashicorp/nomad/blob/5abbee5d396c0ce693cae570a6a8ac43aa640cdc/acl/policy.go#L38

…ui/respect-acl

…pect-acl # Conflicts: # ui/mirage/scenarios/default.js # ui/package.json # ui/yarn.lock

Since the API expands a policy shorthand like “write” into its constituent capabilities, examining the policy is no longer needed.

I’m hesitant to add storage-clearing before every test.

backspace · 2019-12-04T17:39:43Z

I made this gist with scripts and configuration to help test this out locally. Note that after changing tokens, a refresh is required, due to #6492.

Some implementation notes:

The is-right-aligned class for the tooltip is a somewhat-arbitrary translateX(-75%), otherwise it runs off the edge of the page because of where the “Run Job” button lives.
The button was duplicated twice for mobile and non-mobile, so now the logic to turn it off is also duplicated… I wasn’t sure if it reached the level of duplication that made extraction desirable.
It may be overkill to include ember-can for just one “ability” but I foresee using this for conditionally turning off/removing the exec button too, so I think it’ll be of further use.

johncowen

Hey @backspace !

I left quite a few comments I think! Most are just me thinking out loud though. ~~The most important one is the CSS selector one, you'll probably want to change that before merging.~~ The others are kind of up to you but I'd defo consider using a disabled <button> here rather than a <div>, although maybe theres a reason here that I'm not aware of as to why you can't use a button?

I did a little run of the app, but I don't know how to set things up so I can see things working (the default dev setup just gives me default, namespace-1 and namespace-2), no biggie though as Michael can probably give it a once over also.

Oh also, is there test coverage here for if you are running nomad without namespace support?

P.S. Guess who missed a couple of commits off the end of here cos he didn't pull 😆 ! Ignore the CSS selector one!

johncowen · 2019-12-04T18:16:35Z

ui/app/templates/jobs/index.hbs

+          {{#if (can "run job")}}
+            {{#link-to "jobs.run" data-test-run-job class="button is-primary"}}Run Job{{/link-to}}
+          {{else}}
+            <div data-test-run-job class="button tooltip is-right-aligned" aria-label="You don’t have permission to run jobs" disabled>Run Job</div>


Noticed the disabled here, I think I've seen this working on form like things like fieldsets, does it work with divs also? Actually should this be a button not a div? I suppose its a disabled button/non-interactive button which may as well be a div 😁 , not sure what accessibility things might come into play here? But if it was me personally I'd prefer semantic HTML.

Also just noticed the curl apostrophe, super nit I know, I'm guessing that will come out ok on all platforms, but thought I'd check just incase, do you use curly punctuation elsewhere in nomad?

I guess I chose a div over a button because the thing it’s in parallel to is an a, but it’s true that having it be a true button makes the most sense, so I’ve changed it, thanks.

re the curly apostrophe, maybe it’s the only one that isn’t in code only; I type this way automatically so it doesn’t occur to me. Maybe @DingoEatingFuzz has a preference but I like them haha

I am a fan of using proper typographic marks, therefore I am a fan of the curly apostrophe. That said, I haven't been disciplined about ensuring things like curly quotes, ellipses, en dashes, and the like.

johncowen · 2019-12-04T18:18:42Z

ui/package.json

@@ -49,6 +49,7 @@
    "d3-transition": "^1.1.0",
    "ember-ajax": "^5.0.0",
    "ember-auto-import": "^1.2.21",
+    "ember-can": "^2.0.0",


I am literally about to work on something very similar to this PR, so thanks for the hinter for ember-can!

johncowen · 2019-12-04T18:36:56Z

ui/app/services/token.js

+          .catch(() => []);
+      }
+    } catch (e) {
+      return null;


Pretty sure that you probably did this because you want things to fail silently here, I also noticed the empty catch up above somewhere, just thought I'd check that you don't want any user visible errors here, I'm guessing that's what this means.

Also, just curious more than anything, but there is this mix of try/catch / .then()/.catch() code here, is there anyway to write it so that you always use one style? No prob if not just wondering really.

It’s possible that something should happen if policy-fetching fails… I’m not sure what that would be, it’s maybe somewhat of a design question. But it’s true that it was overly convoluted, I’ve removed the redundancy so it’s more idiomatic, thanks.

ui/app/styles/components/tooltip.scss

ui/app/abilities/job.js

johncowen · 2019-12-05T09:38:30Z

ui/tests/unit/abilities/job-test.js

+    systemService.set('activeNamespace.name', '000-abc-999');
+    assert.ok(jobAbility.canRun, 'expected to be able to match against more than one wildcard');
+  });
+});


Bit of a nitpick here, but to me these are all integration tests for canRun. They are testing more than a 'unit'. A unit test for canRun would be a test to see if or works (ember owns or so it wouldn't be worth unit testing it yourself). A lot of what you are testing here is ember code and how you integrate with it.

I'd say the best thing to unit test here would be the _findMatchingNamespace method, which seems to be the actual logic for this feature. The rest of the code is pretty much integrating that into the framework/app. If you could make it so _findMatchingNamespace was importable you could even test it without all the mocking code you have here, so your test code would be way shorter.

Testing taxonomy/nomenclature is always subjective but to me this does qualify as a unit test because I’m mocking the interfaces with the rest of the application that the unit is interacting with; for me, an integration test wouldn’t use mocking. I don’t tend to extract things like the namespace-matching until they’re useful elsewhere in the application, which this likely will never be, unless I need to for testing reasons, which I don’t think I do in this case. But I recognise this is a realm of many opinions and little consensus 🤓

Cool, thanks for the thoughts!

@johncowen

Thanks to @johncowen for pointing out that this is on the way to being deprecated: #5944 (comment) emberjs/rfcs#554

DingoEatingFuzz

Great work and excellent test coverage!

I'm happy with this as is but I left a bunch of "food for thought" type comments too.

DingoEatingFuzz · 2019-12-10T03:11:48Z

ui/app/abilities/job.js

+  canRun: or('selfTokenIsManagement', 'policiesSupportRunning'),
+
+  selfTokenIsManagement: equal('token.selfToken.type', 'management'),


I can see this pattern being repeated a bunch. Might be worth thinking about by one of us as we repeat it for both exec and node drain.

DingoEatingFuzz · 2019-12-10T03:36:05Z

ui/app/abilities/job.js

+    } else if (namespaceNames.includes('default')) {
+      return 'default';
+    }
+  },


After reading through this whole file, I could make some nitpicks about potential performance gains around caching intermediate values and sorting lists before scanning them, but instead I won't 😛 There is unlikely to ever be so many namespaces that this becomes performance critical code.

However I will point out that there is a lot going on in this ability file. It speaks to the possible need for a better policy primitive that can be used to make authoring abilities easier.

I'm curious what your thoughts here are since you have spent more time working through policy parsing and traversing.

I agree that there’s a lot happening and that it’ll be worth extracting, I just tend to push that kind of thing off into the future when it’s actually needed to avoid premature abstraction. There was a time when I’d create elaborate generalised structures in anticipation and end up with something that didn’t work as well when it came time to be used elsewhere, so I now err on the side of solving the immediate problem and then trying to generalise when it becomes useful so the solution can be informed by real needs.

I’m planning to check ACLs for the exec button so that time isn’t far off 😆

ui/app/routes/jobs/run.js

DingoEatingFuzz · 2019-12-10T04:04:42Z

ui/app/templates/jobs/index.hbs

+            {{#link-to "jobs.run" data-test-run-job class="button is-primary"}}Run Job{{/link-to}}
+          {{else}}
+            <button data-test-run-job class="button tooltip is-right-aligned" aria-label="You don’t have permission to run jobs" disabled>Run Job</button>
+          {{/if}}


The duplication is a bummer, but it's also not the worst. It's right at the point where it could be dangerous, but at least the two occurrences are co-located.

I also wouldn't be opposed to job-run-button component.

ui/tests/acceptance/allocation-detail-test.js

ui/tests/acceptance/job-evaluations-test.js

ui/tests/helpers/resolver.js

ui/tests/pages/jobs/list.js

This must have seemed/been necessary at some point but doesn’t break anything when removed!

backspace · 2020-01-20T20:55:17Z

I’m going to merge! 😯

* actually always canonicalize alloc.Job alloc.Job may be stale as well and need to migrate it. It does cost extra cycles but should be negligible. * e2e: improve reusability of provisioning scripts (hashicorp#6942) This changeset is part of the work to improve our E2E provisioning process to allow our upgrade tests: * Move more of the setup into the AMI image creation so it's a little more obvious to provisioning config authors which bits are essential to deploying a specific version of Nomad. * Make the service file update do a systemd daemon-reload so that we can update an already-running cluster with the same script we use to deploy it initially. * Avoid unnecessary golang version reference * add a script to update golang version * Update golang to 1.12.15 * Update ecs.html.md * Update configuring-tasks.html.md * ui: Change Run Job availability based on ACLs (hashicorp#5944) This builds on API changes in hashicorp#6017 and hashicorp#6021 to conditionally turn off the “Run Job” button based on the current token’s capabilities, or the capabilities of the anonymous policy if no token is present. If you try to visit the job-run route directly, it redirects to the job list. * Update changelog * e2e: use valid jobspec for group check test (hashicorp#6967) Group service checks cannot interpolate task fields, because the task fields are not available at the time the script check hook is created for the group service. When f31482a was merged this e2e test began failing because we are now correctly matching the script check ID to the service ID, which revealed this jobspec was invalid. * UI: Migrate to Storybook (hashicorp#6507) I originally planned to add component documentation, but as this dragged on and I found that JSDoc-to-Markdown sometimes needed hand-tuning, I decided to skip it and focus on replicating what was already present in Freestyle. Adding documentation is a finite task that can be revisited in the future. My goal was to migrate everything from Freestyle with as few changes as possible. Some adaptations that I found necessary: • the DelayedArray and DelayedTruth utilities that delay component rendering until slightly after initial render because without them: ◦ charts were rendering with zero width ◦ the JSON viewer was rendering with empty content • Storybook in Ember renders components in a routerless/controllerless context by default, so some component stories needed changes: ◦ table pagination/sorting stories access to query params, which necessitates some reaching into Ember internals to start routing and dynamically generate a Storybook route/controller to render components into ◦ some stories have a faux controller as part of their Storybook context that hosts setInterval-linked dynamic computed properties • some jiggery-pokery with anchor tags ◦ inert href='#' had to become href='javascript:; ◦ links that are actually meant to navigate need target='_parent' so they don’t navigate inside the Storybook iframe Maybe some of these could be addressed by fixes in ember-cli-storybook but I’m wary of digging around in there any more than I already have, as I’ve lost a lot of time to Storybook confusion and frustrations already 😞 The STORYBOOK=true environment variable tweaks some environment settings to get things working as expected in the Storybook context. I chose to: • use angle bracket invocation within stories rather than have to migrate them soon after having moved to Storybook • keep Freestyle around for now for its palette and typeface components * e2e: update framework to allow deploying Nomad (hashicorp#6969) The e2e framework instantiates clients for Nomad/Consul but the provisioning of the actual Nomad cluster is left to Terraform. The Terraform provisioning process uses `remote-exec` to deploy specific versions of Nomad so that we don't have to bake an AMI every time we want to test a new version. But Terraform treats the resulting instances as immutable, so we can't use the same tooling to update the version of Nomad in-place. This is a prerequisite for upgrade testing. This changeset extends the e2e framework to provide the option of deploying Nomad (and, in the future, Consul/Vault) with specific versions to running infrastructure. This initial implementation is focused on deploying to a single cluster via `ssh` (because that's our current need), but provides interfaces to hook the test run at the start of the run, the start of each suite, or the start of a given test case. Terraform work includes: * provides Terraform output that written to JSON used by the framework to configure provisioning via `terraform output provisioning`. * provides Terraform output that can be used by test operators to configure their shell via `$(terraform output environment)` * drops `remote-exec` provisioning steps from Terraform * makes changes to the deployment scripts to ensure they can be run multiple times w/ different versions against the same host. * e2e: ensure group script check tests interpolation (hashicorp#6972) Fixes a bug introduced in 0aa58b9 where we're writing a test file to a taskdir-interpolated location, which works when we `alloc exec` but not in the jobspec for a group script check. This changeset also makes the test safe to run multiple times by namespacing the file with the alloc ID, which has the added bonus of exercising our alloc interpolation code for group script checks. * Return FailedTGAlloc metric instead of no node err If an existing system allocation is running and the node its running on is marked as ineligible, subsequent plan/applys return an RPC error instead of a more helpful plan result. This change logs the error, and appends a failedTGAlloc for the placement. * update changelog * extract leader step function * Handle Nomad leadership flapping Fixes a deadlock in leadership handling if leadership flapped. Raft propagates leadership transition to Nomad through a NotifyCh channel. Raft blocks when writing to this channel, so channel must be buffered or aggressively consumed[1]. Otherwise, Raft blocks indefinitely in `raft.runLeader` until the channel is consumed[1] and does not move on to executing follower related logic (in `raft.runFollower`). While Raft `runLeader` defer function blocks, raft cannot process any other raft operations. For example, `run{Leader|Follower}` methods consume `raft.applyCh`, and while runLeader defer is blocked, all raft log applications or config lookup will block indefinitely. Sadly, `leaderLoop` and `establishLeader` makes few Raft calls! `establishLeader` attempts to auto-create autopilot/scheduler config [3]; and `leaderLoop` attempts to check raft configuration [4]. All of these calls occur without a timeout. Thus, if leadership flapped quickly while `leaderLoop/establishLeadership` is invoked and hit any of these Raft calls, Raft handler _deadlock_ forever. Depending on how many times it flapped and where exactly we get stuck, I suspect it's possible to get in the following case: * Agent metrics/stats http and RPC calls hang as they check raft.Configurations * raft.State remains in Leader state, and server attempts to handle RPC calls (e.g. node/alloc updates) and these hang as well As we create goroutines per RPC call, the number of goroutines grow over time and may trigger a out of memory errors in addition to missed updates. [1] https://github.com/hashicorp/raft/blob/d90d6d6bdacf1b35d66940b07be515b074d89e88/config.go#L190-L193 [2] https://github.com/hashicorp/raft/blob/d90d6d6bdacf1b35d66940b07be515b074d89e88/raft.go#L425-L436 [3] https://github.com/hashicorp/nomad/blob/2a89e477465adbe6a88987f0dcb9fe80145d7b2f/nomad/leader.go#L198-L202 [4] https://github.com/hashicorp/nomad/blob/2a89e477465adbe6a88987f0dcb9fe80145d7b2f/nomad/leader.go#L877 * e2e: document e2e provisioning process (hashicorp#6976) * Add the digital marketing team as the code owners for the website dir * Mock the eligibility endpoint in mirage * Implement eligibility toggling in the data layer * Add isMigrating property to the allocation model * Mock the drain endpoint * drain and forceDrain adapter methods * Update drain methods to properly wrap DrainSpec params * cancelDrain adapter method * Reformat the client detail page to use the two-row header design * Add tooltip to the eligibility control * Update the underlying node model when toggling eligibility in mirage * Eligibility toggling behavior * PopoverMenu component * Update the dropdown styles to be more similar to button styles * Multiline modifier for tooltips * More form styles as needed for the drain form * Initial layout of the drain options popover * Let dropdowns assume their full width * Add triggerClass support to the popover menu * Factor out the drain popover and implement its behaviors * Extract the duration parsing into a util * Test coverage for the parse duration util * Refactor parseDuration to support multi-character units * Polish for the drain popover * Stub out all the markup for the new drain strategy view * Fill in the drain strategy ribbon values * Fill out the metrics and time since values in the drain summary * Drain complete notification * Drain stop and update and notifications * Modifiers to the two-step-button * Make outline buttons have a solid white background * Force drain button in the drain info box * New toggle component * Swap the eligiblity checkbox out for a toggle * Toggle bugs: focus and multiline alignment * Switch drain popover checkboxes for toggles * Clear all notifications when resetting the controller * Model the notification pattern as a page object component * Update the client detail page object * Integration tests for the toggle component * PopoverMenu integration tests * Update existing tests * New test coverage for the drain capabilities * Stack the popover menu under the subnav * Use qunit-dom where applicable * Increase the size and spacing of the toggle component * Remove superfluous information from the client details ribbon * Tweak vertical spacing of headings * Update client detail test given change to the compositeStatus property * Replace custom parse-duration implementation with an existing lib * fix comment * consul: add support for canary meta * website: add canary meta to api docs * docs: add Go versioning policy * consul: fix var name from rebase * docs: reseting bootstrap doesn't invalidate token * consul: fix var name from rebase * Update website/source/guides/security/acl.html.markdown Co-Authored-By: Tim Gross <tim@0x74696d.com> * e2e: packer builds should not be public (hashicorp#6998) * docs: tweaks * include test and address review comments * handle channel close signal Always deliver last value then send close signal. * tweak leadership flapping log messages * tests: defer closing shutdownCh * client: canonicalize alloc.Job on restore There is a case for always canonicalizing alloc.Job field when canonicalizing the alloc. I'm less certain of implications though, and the job canonicalize hasn't changed for a long time. Here, we special case client restore from database as it's probably the most relevant part. When receiving an alloc from RPC, the data should be fresh enough. * Support customizing full scheduler config * tests: run_for is already a string * canary_meta will be part of 0.10.3 (not 0.10.2) I assume this is just an oversight. I tried adding the `canary_meta` stanza to an existing v0.10.2 setup (Nomad v0.10.2 (0d2d6e3) and it did show the error message: ``` * group: 'ggg', task: 'tttt', invalid key: canary_meta ``` * use golang 1.12.16 * Allow nomad monitor command to lookup server UUID Allows addressing servers with nomad monitor using the servers name or ID. Also unifies logic for addressing servers for client_agent_endpoint commands and makes addressing logic region aware. rpc getServer test * fix tests, update changelog * e2e: add a -suite flag to e2e.Framework This change allows for providing the -suite=<Name> flag when running the e2e framework. If set, only the matching e2e/Framework.TestSuite.Component will be run, and all ther suites will be skipped. * Document default_scheduler_config option * document docker's disable_log_collection flag * batch mahmood's changelog entries [ci skip] * incorporate review feedback * core: add limits to unauthorized connections Introduce limits to prevent unauthorized users from exhausting all ephemeral ports on agents: * `{https,rpc}_handshake_timeout` * `{http,rpc}_max_conns_per_client` The handshake timeout closes connections that have not completed the TLS handshake by the deadline (5s by default). For RPC connections this timeout also separately applies to first byte being read so RPC connections with TLS enabled have `rpc_handshake_time * 2` as their deadline. The connection limit per client prevents a single remote TCP peer from exhausting all ephemeral ports. The default is 100, but can be lowered to a minimum of 26. Since streaming RPC connections create a new TCP connection (until MultiplexV2 is used), 20 connections are reserved for Raft and non-streaming RPCs to prevent connection exhaustion due to streaming RPCs. All limits are configurable and may be disabled by setting them to `0`. This also includes a fix that closes connections that attempt to create TLS RPC connections recursively. While only users with valid mTLS certificates could perform such an operation, it was added as a safeguard to prevent programming errors before they could cause resource exhaustion. * docs: document limits Taken more or less verbatim from Consul. * Merge pull request hashicorp#160 from hashicorp/b-mtls-hostname server: validate role and region for RPC w/ mTLS * docs: bump 0.10.2 -> 0.10.3 * docs: add v0.10.3 release to changelog * Add an ability for client permissions * Refactor ability tests to use a setup hook for ability lookup * Enable the eligibility toggle conditionally based on acls * Refetch all ACL things when the token changes * New disabled buttons story * Disabled button styles * Disable options for popover and drain-popover * hclfmt a test jobspec (hashicorp#7011) * Update disabled 'Run Job' button to use standard disabled style * Add an explanatory tooltip to the unauthorized node drain popover * Fix token referencing from the token controller, as well as resetting * Handle the case where ACLs aren't enabled in abilities * Account for disabled ACLs in ability tests * Acceptance test for disabled node write controls * Use secret ID for NOMAD_TOKEN Use secret ID for NOMAD_TOKEN as the accessor ID doesn't seem to work. I tried with a local micro cluster following the tutorials, and if I do: ```console $ export NOMAD_TOKEN=85310d07-9afa-ef53-0933-0c043cd673c7 ``` Using the accessor ID as in this example, I get an error: ``` Error querying jobs: Unexpected response code: 403 (ACL token not found) ``` But when using the secret ID in that env var it seems to work correctly. * Pass stats interval colleciton to executor This fixes a bug where executor based drivers emit stats every second, regardless of user configuration. When serializing the Stats request across grpc, the nomad agent dropped the Interval value, and then executor uses 1s as a default value. * changelog * Some fixes to connection pooling Pick up some fixes from Consul: * If a stream returns an EOF error, clear session from cache/pool and start a new one. * Close the codec when closing StreamClient * Allow for an icon within the node status light * Add an icon inside the node status light * Assign icons to node statuses * New node initializing icon * Redo the node-status-light CSS to be icon-based * Add an animation for the initializing state * Call out the 'down' status too, since it's a pretty bad one * command, docs: create and document consul token configuration for connect acls (hashicorpgh-6716) This change provides an initial pass at setting up the configuration necessary to enable use of Connect with Consul ACLs. Operators will be able to pass in a Consul Token through `-consul-token` or `$CONSUL_TOKEN` in the `job run` and `job revert` commands (similar to Vault tokens). These values are not actually used yet in this changeset. * nomad: ensure a unique ClusterID exists when leader (hashicorpgh-6702) Enable any Server to lookup the unique ClusterID. If one has not been generated, and this node is the leader, generate a UUID and attempt to apply it through raft. The value is not yet used anywhere in this changeset, but is a prerequisite for hashicorpgh-6701. * client: enable nomad client to request and set SI tokens for tasks When a job is configured with Consul Connect aware tasks (i.e. sidecar), the Nomad Client should be able to request from Consul (through Nomad Server) Service Identity tokens specific to those tasks. * nomad: proxy requests for Service Identity tokens between Clients and Consul Nomad jobs may be configured with a TaskGroup which contains a Service definition that is Consul Connect enabled. These service definitions end up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In the case where Consul ACLs are enabled, a Service Identity token is required for these tasks to run & connect, etc. This changeset enables the Nomad Server to recieve RPC requests for the derivation of SI tokens on behalf of instances of Consul Connect using Tasks. Those tokens are then relayed back to the requesting Client, which then injects the tokens in the secrets directory of the Task. * client: enable envoy bootstrap hook to set SI token When creating the envoy bootstrap configuration, we should append the "-token=<token>" argument in the case where the sidsHook placed the token in the secrets directory. * nomad: fixup token policy validation * nomad: handle SI token revocations concurrently Be able to revoke SI token accessors concurrently, and also ratelimit the requests being made to Consul for the various ACL API uses. * agent: re-enable the server in dev mode * client: remove unused indirection for referencing consul executable Was thinking about using the testing pattern where you create executable shell scripts as test resources which "mock" the process a bit of code is meant to fork+exec. Turns out that wasn't really necessary in this case. * client: skip task SI token file load failure if testing as root The TestEnvoyBootstrapHook_maybeLoadSIToken test case only works when running as a non-priveleged user, since it deliberately tries to read an un-readable file to simulate a failure loading the SI token file. * comments: cleanup some leftover debug comments and such * nomad,client: apply smaller PR suggestions Apply smaller suggestions like doc strings, variable names, etc. Co-Authored-By: Nick Ethier <nethier@hashicorp.com> Co-Authored-By: Michael Schurter <mschurter@hashicorp.com> * nomad,client: apply more comment/style PR tweaks * client: set context timeout around SI token derivation The derivation of an SI token needs to be safegaurded by a context timeout, otherwise an unresponsive Consul could cause the siHook to block forever on Prestart. * client: manage TR kill from parent on SI token derivation failure Re-orient the management of the tr.kill to happen in the parent of the spawned goroutine that is doing the actual token derivation. This makes the code a little more straightforward, making it easier to reason about not leaking the worker goroutine. * nomad: fix leftover missed refactoring in consul policy checking * nomad: make TaskGroup.UsesConnect helper a public helper * client: PR cleanup - shadow context variable * client: PR cleanup - improved logging around kill task in SIDS hook * client: additional test cases around failures in SIDS hook * tests: skip some SIDS hook tests if running tests as root * e2e: e2e test for connect with consul acls Provide script for managing Consul ACLs on a TF provisioned cluster for e2e testing. Script can be used to 'enable' or 'disable' Consul ACLs, and automatically takes care of the bootstrapping process if necessary. The bootstrapping process takes a long time, so we may need to extend the overall e2e timeout (20 minutes seems fine). Introduces basic tests for Consul Connect with ACLs. * e2e: remove forgotten unused field from new struct * e2e: do not use eventually when waiting for allocs This test is causing panics. Unlike the other similar tests, this one is using require.Eventually which is doing something bad, and this change replaces it with a for-loop like the other tests. Failure: === RUN TestE2E/Connect === RUN TestE2E/Connect/*connect.ConnectE2ETest === RUN TestE2E/Connect/*connect.ConnectE2ETest/TestConnectDemo === RUN TestE2E/Connect/*connect.ConnectE2ETest/TestMultiServiceConnect === RUN TestE2E/Connect/*connect.ConnectClientStateE2ETest panic: Fail in goroutine after TestE2E/Connect/*connect.ConnectE2ETest has completed goroutine 38 [running]: testing.(*common).Fail(0xc000656500) /opt/google/go/src/testing/testing.go:565 +0x11e testing.(*common).Fail(0xc000656100) /opt/google/go/src/testing/testing.go:559 +0x96 testing.(*common).FailNow(0xc000656100) /opt/google/go/src/testing/testing.go:587 +0x2b testing.(*common).Fatalf(0xc000656100, 0x1512f90, 0x10, 0xc000675f88, 0x1, 0x1) /opt/google/go/src/testing/testing.go:672 +0x91 github.com/hashicorp/nomad/e2e/connect.(*ConnectE2ETest).TestMultiServiceConnect.func1(0x0) /home/shoenig/go/src/github.com/hashicorp/nomad/e2e/connect/multi_service.go:72 +0x296 github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert.Eventually.func1(0xc0004962a0, 0xc0002338f0) /home/shoenig/go/src/github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert/assertions.go:1494 +0x27 created by github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert.Eventually /home/shoenig/go/src/github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert/assertions.go:1493 +0x272 FAIL github.com/hashicorp/nomad/e2e 21.427s * e2e: uncomment test case that is not broken * e2e: use hclfmt on consul acls policy config files * e2e: agent token was only being set for server0 * e2e: remove redundant extra API call for getting allocs * e2e: setup consul ACLs a little more correctly * tests: set consul token for nomad client for testing SIDS TR hook * nomad: min cluster version for connect ACLs is now v0.10.4 * nomad: remove unused default schedular variable This is from a merge conflict resolution that went the wrong direction. I assumed the block had been added, but really it had been removed. Now, it is removed once again. * docs: update chanagelog to mention connect with acls * nomad/docs: increment version number to 0.10.4 * sentinel: copy jobs to prevent mutation It's unclear whether Sentinel code can mutate values passed to the eval, so ensure it cannot by copying the job. * ignore computed diffs if node is ineligible test flakey, add temp sleeps for debugging fix computed class * make diffSystemAllocsForNode aware of eligibility diffSystemAllocs -> diffSystemAllocsForNode, this function is only used for diffing system allocations, but lacked awareness of eligible nodes and the node ID that the allocation was going to be placed. This change now ignores a change if its existing allocation is on an ineligible node. For a new allocation, it also checks tainted and ineligible nodes in the same function instead of nil-ing out the diff after computation in diffSystemAllocs * add test for node eligibility * comment for filtering reason * update changelog * vagrant: disable audio interference Avoid Vagrant/virtualbox interferring with host audio when the VM boots. * prehook: fix enterprise repo remote value * dev: Tweaks to cluster dev scripts Consolidate all nomad data dir in a single root `/tmp/nomad-dev-cluster`. Eases clean up. Allow running script from any path - don't require devs to cd into `dev/cluster` directory first. Also, block while nomad processes are running and prapogate SIGTERM/SIGINT to nomad processes to shutdown. * e2e: remove leftover debug println statement * run "make hclfmt" * make: emit explanation for /api isolation Emit a slightly helpful message when /api depends on nomad internal packages. * pool: Clear connection before releasing This to be consistent with other connection clean up handler as well as consul's https://github.com/hashicorp/consul/blob/v1.6.3/agent/pool/pool.go#L468-L479 . * Fix panic when monitoring a local client node Fixes a panic when accessing a.agent.Server() when agent is a client instead. This pr removes a redundant ACL check since ACLs are validated at the RPC layer. It also nil checks the agent server and uses Client() when appropriate. * agent Profile req nil check s.agent.Server() clean up logic and tests * update changelog * docs: fix misspelling * keep placed canaries aligned with alloc status * nomad state store must be modified through raft, rm local state change * add state store test to ensure PlacedCanaries is updated * docs: add link & reorg hashicorp#6690 in changelog * docs: fix typo, ordering, & style in changelog * e2e: turn no-ACLs connect tests back on Also cleanup more missed debugging things >.> * e2e: improve provisioning defaults and documentation (hashicorp#7062) This changeset improves the ergonomics of running the Nomad e2e test provisioning process by defaulting to a blank `nomad_sha` in the Terraform configuration. By default, a user will now need to pass in one of the Nomad version flags. But they won't have to manually edit the `provisioning.json` file for the common case of deploying a released version of Nomad, and won't need to put dummy values for `nomad_sha`. Includes general documentation improvements. * e2e: rename linux runner to avoid implicit build tag (hashicorp#7070) Go implicitly treats files ending with `_linux.go` as build tagged for Linux only. This broke the e2e provisioning framework on macOS once we tried importing it into the `e2e/consulacls` module. * e2e: wait 2m rather than 10s after disabling consul acls Pretty sure Consul / Nomad clients are often not ready yet after the ConsulACLs test disables ACLs, by the time the next test starts running. Running locally things tend to work, but in TeamCity this seems to be a recurring problem. However, when running locally sometimes I do see that the "show status" step after disabling ACLs, some nodes are still initializing, suggesting we're right on the border of not waiting long enough nomad node status ID DC Name Class Drain Eligibility Status 0e4dfce2 dc1 EC2AMAZ-JB3NF9P <none> false eligible ready 6b90aa06 dc2 ip-172-31-16-225 <none> false eligible ready 7068558a dc2 ip-172-31-20-143 <none> false eligible ready e0ae3c5c dc1 ip-172-31-25-165 <none> false eligible ready 15b59ed6 dc1 ip-172-31-23-199 <none> false eligible initializing Going to try waiting a full 2 minutes after disabling ACLs, hopefully that will help things Just Work. In the future, we should probably be parsing the output of the status checks and actually confirming all nodes are ready. Even better, maybe that's something shipyard will have built-in. * add e2e test for system sched ineligible nodes * get test passing, new util func to wait for not pending * clean up * rm unused field * fix check * simplify job, better error * docs: hashicorp#6065 shipped in v0.10.0, not v0.9.6 PR hashicorp#6065 was intended to be backported to v0.9.6 to fix issue hashicorp#6223. However it appears to have not been backported: * https://github.com/hashicorp/nomad/blob/v0.9.6/client/allocrunner/taskrunner/task_runner.go#L1349-L1351 * https://github.com/hashicorp/nomad/blob/v0.9.7/client/allocrunner/taskrunner/task_runner.go#L1349-L1351 The fix was included in v0.10.0: * https://github.com/hashicorp/nomad/blob/v0.10.0/client/allocrunner/taskrunner/task_runner.go#L1363-L1370 * e2e: add --quiet flag to s3 copy to reduce log spam (hashicorp#7085) * Explicit transparent bg on popover actions * Override the max-width on mobile to avoid losing space due to non-existent gutter menu * changelog windows binaries being signed Note that 0.10.4, nomad windows binaries will be signed. [ci skip] * change log for remote pprof endpoints * nomad: unset consul token on job register * nomad: assert consul token is unset on job register in tests * command: use consistent CONSUL_HTTP_TOKEN name Consul CLI uses CONSUL_HTTP_TOKEN, so Nomad should use the same. Note that consul-template uses CONSUL_TOKEN, which Nomad also uses, so be careful to preserve any reference to that in the consul-template context. * docs: update changelog mentioning consul token passthrough * release: prep 0.10.4 * Generate files for 0.10.4 release * Release v0.10.4 Co-authored-by: Mahmood Ali <mahmood@hashicorp.com> Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Tim Higgison <TimHiggison@users.noreply.github.com> Co-authored-by: Buck Doyle <buck@hashicorp.com> Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com> Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com> Co-authored-by: Michael Lange <dingoeatingfuzz@gmail.com> Co-authored-by: Nick Ethier <nethier@hashicorp.com> Co-authored-by: Tim Gross <tim@0x74696d.com> Co-authored-by: Shantanu Gadgil <shantanugadgil@users.noreply.github.com> Co-authored-by: Seth Hoenig <shoenig@hashicorp.com> Co-authored-by: Sebastián Ramírez <tiangolo@gmail.com> Co-authored-by: Nomad Release bot <nomad@hashicorp.com>

github-actions · 2023-01-21T02:15:49Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

backspace added 16 commits July 9, 2019 12:24

Add preliminary hardcoded job run ability check

e38fa97

Change run job button to respect permissions

be5d47a

Change permissions to require management token

e38bbef

This isn’t a valid solution, but it gets closer to one by ensuring the token is loaded when the application boots. Then I can add another step to load and parse the token’s policies.

Remove unused parameter

3c57124

Add token-clearing before regions test

19a3e30

Add placeholder HCL-to-JSON parser

c77e458

Add policy-fetch and interpretation

6509b6c

Add missing yield

4f207c2

Add fetch of anonymous policy with no token

dc98403

The endpoint doesn’t actually support this 😳

Remove unused variable

6446054

Add anonymous policy to default Mirage scenario

00291da

Add Mirage hack to serve anonymous policy

2592278

This should have been part of dc98403.

Add tooltip

7452cb9

It’s cut off by the edge of the viewport for now!

Add filtering of policy route 404

5aa4e03

These unrelated tests were failing because they assumed that the first 404 was stable.

Change request-examination to ignore policy query

7d132c0

Here’s another similar instance.

Merge branch 'master' into f-ui/respect-acl

aaab9c9

backspace changed the title ~~UI: Fetch and reflect ACLs~~ UI: Change Run Job availability based on ACLs Jul 10, 2019

backspace added 5 commits July 17, 2019 13:47

Add placeholder test for default fallback

701ba1b

Add fallback to default namespace when no matches

b1d618e

My placeholder test was incorrect.

Add partial support for globs

80d5720

The “greatest number of matched characters” part is still forthcoming.

Add match-length-comparison for glob names

de12248

It’s too bad reduce is so unwieldy in Javascript… maybe this would be better a for loop ☹️

Add ability to match more than one wildcard

d949352

backspace mentioned this pull request Jul 25, 2019

api: Add parsed rules to policy response #6017

Merged

backspace added 3 commits July 25, 2019 15:09

Remove HCL-parsing in favour of API-provided JSON

313552a

This is written assuming #6017 is merged as-is. It’s trivial to change the property name if needed!

Add question about only looking at capabilities

d2de141

Remove question about matchable characters

a6641e2

This regex shows what’s permitted: https://github.com/hashicorp/nomad/blob/5abbee5d396c0ce693cae570a6a8ac43aa640cdc/acl/policy.go#L38

backspace mentioned this pull request Jul 26, 2019

api: Update policy endpoint to permit anonymous access #6021

Merged

backspace added 3 commits July 26, 2019 14:38

Merge branch 'master' into f-ui/respect-acl

0ef2fc2

Merge master

25ad69b

Merge branch 'f-ui/respect-acl' of github.com:hashicorp/nomad into f-…

cb91553

…ui/respect-acl

backspace added 9 commits November 26, 2019 13:34

Merge commit '2e21d594c7cc81c674de82c74f6f2fb33bbeafd6' into f-ui/res…

b5cb8a9

…pect-acl # Conflicts: # ui/mirage/scenarios/default.js # ui/package.json # ui/yarn.lock

Remove comment superseded by #6021

f878d72

Remove unused dependent key

284e615

Since the API expands a policy shorthand like “write” into its constituent capabilities, examining the policy is no longer needed.

Remove extraneous whitespace

d5c3071

Change const to let

1dd9284

Remove extraneous autoformat changes

c3a0afa

Remove comment superseded by #6021

71c2fac

Remove Mirage anonymous policy

6f68c0e

Remove FIXME

777df99

I’m hesitant to add storage-clearing before every test.

backspace marked this pull request as ready for review December 4, 2019 17:17

Change tooltip to be right-aligned

c50c55c

backspace requested review from DingoEatingFuzz and a team December 4, 2019 17:40

johncowen reviewed Dec 5, 2019

View reviewed changes

backspace added 3 commits December 5, 2019 11:10

Remove uses of getWithDefault

94304e6

Thanks to @johncowen for pointing out that this is on the way to being deprecated: #5944 (comment) emberjs/rfcs#554

Change disabled div-buttons to true buttons

e054561

Replace error-handling with simpler implementation

9057d26

backspace mentioned this pull request Dec 9, 2019

UI: Invoke Node Drains #6819

Merged

DingoEatingFuzz approved these changes Dec 10, 2019

View reviewed changes

johncowen mentioned this pull request Dec 11, 2019

ui: Namespace authorization integration hashicorp/consul#6933

Merged

backspace added 3 commits December 12, 2019 09:16

Remove resolver override

27cedda

This must have seemed/been necessary at some point but doesn’t break anything when removed!

Merge branch 'master' into f-ui/respect-acl

d72d8ed

Convert test request-filtering to avoid Emberisms

ad82387

backspace merged commit 3adb3cd into master Jan 20, 2020

backspace deleted the f-ui/respect-acl branch January 20, 2020 20:57

github-actions bot locked as resolved and limited conversation to collaborators Jan 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UI: Change Run Job availability based on ACLs #5944

UI: Change Run Job availability based on ACLs #5944

backspace commented Jul 9, 2019 •

edited

Loading

backspace commented Dec 4, 2019

johncowen left a comment •

edited

Loading

johncowen Dec 4, 2019

johncowen Dec 4, 2019

backspace Dec 5, 2019

backspace Dec 5, 2019

DingoEatingFuzz Dec 10, 2019

johncowen Dec 4, 2019

johncowen Dec 4, 2019

backspace Dec 5, 2019

johncowen Dec 5, 2019

backspace Dec 5, 2019

johncowen Dec 5, 2019

DingoEatingFuzz left a comment

DingoEatingFuzz Dec 10, 2019

DingoEatingFuzz Dec 10, 2019

backspace Jan 20, 2020

DingoEatingFuzz Dec 10, 2019

backspace commented Jan 20, 2020

github-actions bot commented Jan 21, 2023

		canRun: or('selfTokenIsManagement', 'policiesSupportRunning'),

		selfTokenIsManagement: equal('token.selfToken.type', 'management'),

UI: Change Run Job availability based on ACLs #5944

UI: Change Run Job availability based on ACLs #5944

Conversation

backspace commented Jul 9, 2019 • edited Loading

backspace commented Dec 4, 2019

johncowen left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DingoEatingFuzz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

backspace commented Jan 20, 2020

github-actions bot commented Jan 21, 2023

backspace commented Jul 9, 2019 •

edited

Loading

johncowen left a comment •

edited

Loading