Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Kibana task to deploy agentless connectors for 9.0 #203973

Merged
merged 36 commits into from
Jan 10, 2025

Conversation

artem-shelkovnikov
Copy link
Member

@artem-shelkovnikov artem-shelkovnikov commented Dec 12, 2024

Closes https://github.com/elastic/search-team/issues/8508

Closes https://github.com/elastic/search-team/issues/8465

Summary

This PR adds a background task for search_connectors plugin. This task checks connector records and agentless package policies and sees if new connector was added/old was deleted, and then adds/deletes package policies for these connectors.

Scenario 1: a new connector was added by a user/API call

User creates an Elastic-managed connector:

Screen.Recording.2024-12-25.at.12.59.14.mov

When the user is done, a package policy is created by this background task:

Screen.Recording.2024-12-25.at.13.00.14.mov

Scenario 2: a connector was deleted by a user/API call

User deletes an Elastic-managed connector:

Screen.Recording.2024-12-25.at.13.21.13.mov

Checklist

Check the PR satisfies following conditions.

Reviewers should verify this PR satisfies this list as well.

  • Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
  • Documentation was added for features that require explanation or tutorials
  • Unit or functional tests were updated or added to match the most common scenarios
  • If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list
  • This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The release_note:breaking label should be applied in these situations.
  • Flaky Test Runner was used on any tests changed
  • The PR description includes the appropriate Release Notes section, and the correct release_note:* label is applied per the guidelines

@artem-shelkovnikov artem-shelkovnikov force-pushed the artem/add-agentless-connectors-task branch 2 times, most recently from 0500e73 to 7456f1b Compare December 25, 2024 11:37
@artem-shelkovnikov artem-shelkovnikov changed the title WIP Introduce Kibana task to deploy agentless connectors for 9.0 Dec 25, 2024
@artem-shelkovnikov artem-shelkovnikov added the release_note:skip Skip the PR/issue when compiling release notes label Dec 25, 2024
@artem-shelkovnikov artem-shelkovnikov marked this pull request as ready for review December 25, 2024 12:26
@artem-shelkovnikov artem-shelkovnikov requested review from a team as code owners December 25, 2024 12:26
fetchAllAgentPolicies: agentPolicyService.fetchAllAgentPolicies,
fetchAllAgentPolicyIds: agentPolicyService.fetchAllAgentPolicyIds,
},
agentPolicyService,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done due to the fact that I needed a create method that depends on a lot of other private/internal methods.

I had to either make the methods public + add here; or I could pass the service itself. Potentially there might be other way, but I'm not familiar enough with Kibana development yet to know, please tell me if there's a better way :)

@botelastic botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Dec 25, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@@ -196,7 +196,7 @@ export const bulkGetAgentPoliciesHandler: FleetRequestHandler<
'full query parameter require agent policies read permissions'
);
}
let items = await agentPolicyService.getByIDs(soClient, ids, {
let items = await agentPolicyService.getByIds(soClient, ids, {
Copy link
Member Author

@artem-shelkovnikov artem-shelkovnikov Dec 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side-effect of removing the usage of AgentPolicyServiceInterface: interface had getByIDs and implementation has getByIds. I chose the latter to stay, but it's easy to rename implementation to getByIDs. This was mostly done to avoid pinging other code owners that might have used the interface method name.

Comment on lines 103 to 105
if (policy.supports_agentless !== true) {
this.logger.debug(`Policy ${policy.id} does not support agentless, skipping`);
continue;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason this doesn't work - I never get a policy that has supports_agentless field.

@artem-shelkovnikov artem-shelkovnikov added backport:skip This commit does not require backporting Team:Search labels Dec 25, 2024
throw new Error(`Connector ${connector.id} service_type is null or empty`);
}

if (NATIVE_CONNECTOR_DEFINITIONS[connector.service_type] == null) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using our regular NATIVE_CONNECTOR_DEFINITIONS as a source of truth for connectors that we support. I could theoretically instead list integrations that are branched off connectors-py instead, is it possible/better?

const AGENTLESS_CONNECTOR_DEPLOYMENTS_SYNC_TASK_ID = 'search:agentless-connectors-sync-task';
const AGENTLESS_CONNECTOR_DEPLOYMENTS_SYNC_TASK_TYPE = 'search:agentless-connectors-sync';

const SCHEDULE = { interval: '1m' };
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elastic/fleet - what's the minimal interval with which we could query fleet package policies (we narrow them with a kuery that only returns our package elastic_connectors?

Can we do 10 seconds? 30 seconds?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only query for a certain package, there shouldn't be too many results, so it shouldn't be a problem with scale. I think using 30s sounds fine too, 10s might be too frequent.

description:
'This task peridocally checks native connectors, agent policies and syncs them if they are out of sync',
timeout: '1m',
maxAttempts: 3,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need to retry, since we run pretty often?

@artem-shelkovnikov artem-shelkovnikov requested a review from a team as a code owner December 27, 2024 19:08
Copy link
Contributor

@juliaElastic juliaElastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fleet changes LGTM

Copy link
Member

@jedrazb jedrazb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff! The changes in the search_connectors plugin LGTM. I have a couple of minor comments regarding naming and one question about hardcoding the package version in the task manager logic.

I’ll defer reviewing the changes in the fleet plugin to the fleet team. EDIT: I see they just approved 🚀


const connectorsInputName = 'connectors-py';
const pkgName = 'elastic_connectors';
const pkgVersion = '0.0.4';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this version hardcoded here? The current (latest) version in the integration registry should be def tracked somewhere by fleet, can we look it up in the package registry dynamically?

Context, 0.0.4 is already outdated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe code edited in this PR will help? https://github.com/elastic/kibana/pull/192081/files here I was able to access package info and adjust permissions dynamically

Copy link
Member

@jedrazb jedrazb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

const taskInstance = await taskManager.ensureScheduled({
id: AGENTLESS_CONNECTOR_DEPLOYMENTS_SYNC_TASK_ID,
taskType: AGENTLESS_CONNECTOR_DEPLOYMENTS_SYNC_TASK_TYPE,
schedule: SCHEDULE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a quick look here from Response Ops. I was reading the PR description and was wondering if we need to have this task run every 30s indefinitely or if it would be possible to make it event based so it runs after a user creates or deletes a connector? Or perhaps a combo of the two but the schedule runs less frequently?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was thinking the same ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now this seemed to us like the best way to move forward:

The task runs and checks if any agentless policies need to be created for our connector records. Connector records can be created in multiple ways:

  1. User creates a connector via UI
  2. Connector is created automatically by already running agentless connector deployment
  3. User creates a connector via API/CLI

Scenario #1 can be done with an event triggered by Kibana UI easily. Scenario #2 does not need this logic. Scenario #3 really needs this task - our CLI doesn't have access to Task Manager + our API is hosted in Elasticsearch, and Elasticsearch also has no way to affect this task run time.

This way we've taken current approach with polling every 30 seconds (a minute should be fine too), plus the task itself queries reasonably small amount of data, I believe, for it hopefully not to be too problematic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GenAI connectors have a similar sort of constraint, where something in Kibana wants to know when connectors get created / updated / deleted. Added in #189027

That PR originally contained some connector logic for the new "hooks", but we extracted that and restructured into a stand-alone PR: #194081 , rather than ship the two pieces together.

So, in theory case 3 can be handled this way.

Looking at those PRs, I'm also wondering if you need to handle the case of connectors being updated / deleted ...

Copy link
Member Author

@artem-shelkovnikov artem-shelkovnikov Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've skimmed through. the change but don't understand how it handles case 3 - we have customer calling Elasticsearch API directly, Kibana is not involved in this.

So we cannot have hooks attached to this call, all we can do is poll the content of a couple indices to see if changes were made. Am I missing some detail in the mentioned PR that works around this limitation?

Connector update is not important for us, but deletion is also handled in this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, these aren't alerting connectors? These are "search" connectors? If so, you're correct, completely different "connector" framework I was talking about (I was talking about the alerting connectors).

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResponseOps code LGTM, left a few comments

const AGENTLESS_CONNECTOR_DEPLOYMENTS_SYNC_TASK_ID = 'search:agentless-connectors-manager-task';
const AGENTLESS_CONNECTOR_DEPLOYMENTS_SYNC_TASK_TYPE = 'search:agentless-connectors-manager';

const SCHEDULE = { interval: '30s' };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting this to the largest value you are willing to live with, will be helpful to Kibana's task throughput :-)

I believe a comment in the PR indicated it could be set to "1m" which would cut down the executions by 50% (useful!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change to "1m" indeed, it should not hurt us much, and we can iterate on this number later if it's gonna be too much/too little!

};
}
},
cancel: async () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if you want to have the cancel actually stop the task from running, you'll have to do a bit more. This function is invoked when TM decides the task needs to be cancelled (running longer than it's time limit). The basic idea is you set a local indicating you've been cancelled, and then can check that in the run() method. Example here:

const createTaskRunnerFactory =
({
logger,
telemetry,
executeEnrichPolicy,
getStoreSize,
}: {
logger: Logger;
telemetry: AnalyticsServiceSetup;
executeEnrichPolicy: ExecuteEnrichPolicy;
getStoreSize: GetStoreSize;
}) =>
({ taskInstance }: { taskInstance: ConcreteTaskInstance }) => {
let cancelled = false;
const isCancelled = () => cancelled;
return {
run: async () =>
runTask({
executeEnrichPolicy,
getStoreSize,
isCancelled,
logger,
taskInstance,
telemetry,
}),
cancel: async () => {
cancelled = true;
},
};
};
- note that this code doesn't actually seem to use the isCancelled() local function they created - I think it did at one point, must have been removed in another PR ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I see now - I copied original code from some other place.

The task we have is supposed to be very fast (obviously, depending on response times from Elasticsearch). At best it's 2 calls to Elasticsearch, at worst it's probably does 10s of calls (realistically, I expect 2 calls 99.99% of the time, and occasionally 4 calls).

IMO adding true cancellation is not gonna add a lot here, but I'm not sure yet. I will merge what is there for now and keep it in mind for when we need it :)

@elasticmachine
Copy link
Contributor

elasticmachine commented Jan 9, 2025

💔 Build Failed

Failed CI Steps

Test Failures

  • [job] [logs] Fleet Cypress Tests #4 / Add Integration - Automatic Import should create an integration
  • [job] [logs] Fleet Cypress Tests #4 / Add Integration - Automatic Import should create an integration
  • [job] [logs] Fleet Cypress Tests #2 / When the user has All permissions for Integrations and All permissions for actions Create Assistant is not accessible but upload is accessible
  • [job] [logs] Fleet Cypress Tests #2 / When the user has All permissions for Integrations and All permissions for actions Create Assistant is not accessible but upload is accessible

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
fleet 1309 1319 +10
Unknown metric groups

API count

id before after diff
fleet 1436 1446 +10

History

@artem-shelkovnikov artem-shelkovnikov enabled auto-merge (squash) January 10, 2025 09:54
@artem-shelkovnikov artem-shelkovnikov merged commit c88d519 into main Jan 10, 2025
9 checks passed
@artem-shelkovnikov artem-shelkovnikov deleted the artem/add-agentless-connectors-task branch January 10, 2025 11:22
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this pull request Jan 13, 2025
…#203973)

## Closes elastic/search-team#8508
## Closes elastic/search-team#8465

## Summary

This PR adds a background task for search_connectors plugin. This task
checks connector records and agentless package policies and sees if new
connector was added/old was deleted, and then adds/deletes package
policies for these connectors.

Scenario 1: a new connector was added by a user/API call

User creates an Elastic-managed connector:


https://github.com/user-attachments/assets/38296e48-b281-4b2b-9750-ab0a47334b55

When the user is done, a package policy is created by this background
task:


https://github.com/user-attachments/assets/12dbc33f-32bf-472d-b854-64588fc1e5b1

Scenario 2: a connector was deleted by a user/API call

User deletes an Elastic-managed connector:


https://github.com/user-attachments/assets/5997897e-fb9d-4199-8045-abe163264976

### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Jedr Blaszyk <jedrazb@gmail.com>
artem-shelkovnikov added a commit that referenced this pull request Jan 16, 2025
…6606)

## Summary

This PR makes it so that the Agentless Kibana task implemented in
#203973 properly handles
soft-deleted connectors.

This helps with the situation when an integration policy has been
created for an agentless connector but a connector record has not yet
been created by an agentless host.

With current Kibana task implementation it could lead to the Policy
being deleted.

With this change, only policies that refer to soft-deleted connectors
will be cleaned up.

### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [ ] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
viduni94 pushed a commit to viduni94/kibana that referenced this pull request Jan 23, 2025
…#203973)

## Closes elastic/search-team#8508
## Closes elastic/search-team#8465

## Summary

This PR adds a background task for search_connectors plugin. This task
checks connector records and agentless package policies and sees if new
connector was added/old was deleted, and then adds/deletes package
policies for these connectors.

Scenario 1: a new connector was added by a user/API call

User creates an Elastic-managed connector:


https://github.com/user-attachments/assets/38296e48-b281-4b2b-9750-ab0a47334b55

When the user is done, a package policy is created by this background
task:


https://github.com/user-attachments/assets/12dbc33f-32bf-472d-b854-64588fc1e5b1

Scenario 2: a connector was deleted by a user/API call

User deletes an Elastic-managed connector:


https://github.com/user-attachments/assets/5997897e-fb9d-4199-8045-abe163264976

### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Jedr Blaszyk <jedrazb@gmail.com>
viduni94 pushed a commit to viduni94/kibana that referenced this pull request Jan 23, 2025
…stic#206606)

## Summary

This PR makes it so that the Agentless Kibana task implemented in
elastic#203973 properly handles
soft-deleted connectors.

This helps with the situation when an integration policy has been
created for an agentless connector but a connector record has not yet
been created by an agentless host.

With current Kibana task implementation it could lead to the Policy
being deleted.

With this change, only policies that refer to soft-deleted connectors
will be cleaned up.

### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [ ] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team Team:Search v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants