Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Filestream integration #11332

Merged
merged 14 commits into from
Nov 20, 2024
Merged

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Oct 4, 2024

Proposed commit message

This commit adds the Filestream integration

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

Author's Checklist

  • Use fingerprint as default file identity

How to test this PR locally

  1. Build the integration
  2. Start a stack with Elastic-Package or upload the integration to an existing stack
  3. Go to the "Add integrations page"
  4. Make sure "Display beta integrations" is selected
  5. Add the "Custom Filestream integration", configure paths to read from any text file
  6. Deploy the Elastic-Agent enrolled in this policy
  7. Ensure the data from the file is ingested.

The Custom Filestream integration uses the fingerprint file identity by default, so make sure the file used for testing has a size of at least 1024 bytes or change the fingerprint length in the integration settings.

Related issues

Screenshots

2024-11-06_20-33

@andrewkroh andrewkroh added the New Integration Issue or pull request for creating a new integration package. label Oct 4, 2024
@belimawr belimawr force-pushed the 5370-filestream-integration branch 2 times, most recently from 380f203 to a5e9ef5 Compare October 28, 2024 16:39
@belimawr belimawr force-pushed the 5370-filestream-integration branch from a5e9ef5 to 84cba1d Compare November 7, 2024 01:18
@belimawr belimawr marked this pull request as ready for review November 7, 2024 01:42
@belimawr belimawr added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team [elastic/elastic-agent-data-plane] label Nov 12, 2024
@elasticmachine
Copy link

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

Copy link
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

Couple of questions in-line. The biggest is on field types. We probably need to double check other integrations to make sure they are defined the same across all of them. We don't want any mismatches.

@belimawr
Copy link
Contributor Author

Looking good.

Couple of questions in-line. The biggest is on field types. We probably need to double check other integrations to make sure they are defined the same across all of them. We don't want any mismatches.

I checked them and they're all keyword in other integrations, thanks for spotting that!

@strawgate
Copy link
Contributor

Can we document that users migrating from a log integration can temporarily use ignore_inactive: since_first_start to set the offset in the registry to the end of discovered files?

@belimawr
Copy link
Contributor Author

Can we document that users migrating from a log integration can temporarily use ignore_inactive: since_first_start to set the offset in the registry to the end of discovered files?

That's a good idea @strawgate ! Where do you think we should add it? To the overview or directly to the configuration option? Or should we have a migration guide (for an ideal migration we'd have to implement some extra fetures)?

@strawgate
Copy link
Contributor

strawgate commented Nov 18, 2024

I think there's some prior art in having migration information in the overview/readme so that makes sense to me.

@belimawr
Copy link
Contributor Author

I think there's some prior art in having migration information in the overview/readme so that makes sense to me.

I like the idea, I'm thinking if we should involve the docs team and questioning what is the best way to communicate it.

I tried to write something, however we're not at a point where we can provide provide a migration guide without data loss, while those options can help with preventing, or reducing, data duplication, there is still the possibility from data loss between the last line ingested by the Log input and when Filestream actually starts. So I fear it could cause confiscation among our users.

We could merge the PR as is and work on a better documentation for migration that will be released when it goes GA or close to it. What do you think?

@strawgate
Copy link
Contributor

When I made a version of this in the past I landed on including language like:

As Filestream configures a new input, configuring it to collect from a file that was previously collected by Custom Logs (Legacy) will result in duplicate data. You may wish to configure ignore_older or temporarily set ignore_inactive: since_first_start to limit the amount of duplicate data collected.

I think we could:

  1. Start with something like that and refine later with docs links, or
  2. At least include a warning about duplicate data

Do we already have an issue created for improving the migration information?

@belimawr
Copy link
Contributor Author

I love the wording @strawgate ! I'll add it to the overview page and create an issue to follow up on improving it.

@belimawr
Copy link
Contributor Author

@strawgate, I've added this bit at the end of the README (89429ce):

As Filestream configures a new input, configuring it to collect data
from a file that was previously collected by Custom Logs integration
will result in duplicate data. You may wish to configure
ignore_older or temporarily set ignore_inactive: since_first_start
to limit the amount of duplicate data collected.

If the Custom Logs integration is removed and the Custom Filestream
Logs is added in the same policy change, there risk of data being
missed between the last entry ingested by the Custom Logs and the
first one ingested by the Custom Filestream Logs.

What do you think?

@elasticmachine
Copy link

💚 Build Succeeded

History

cc @belimawr

@strawgate
Copy link
Contributor

That seems good to me. Though I think we should rename Custom Logs to Custom Logs (Legacy) or something similar -- @kpollich ? If we are going to rename it we should reference the updated name in the readme here too.

@belimawr
Copy link
Contributor Author

My understanding is that we'll rename the Custom Logs integration once the Custom Filestream is GA, this PR releases it in technical preview.

@belimawr belimawr merged commit a40ebee into elastic:main Nov 20, 2024
5 checks passed
@elastic-vault-github-plugin-prod

Package filestream - 0.0.1 containing this change is available at https://epr.elastic.co/package/filestream/0.0.1/

@kpollich
Copy link
Member

+1 on renaming to Custom Logs (Legacy) once the filestream integration is GA

@andrewkroh andrewkroh added the Integration:filestream Custom Filestream Logs label Nov 20, 2024
flexitrev pushed a commit that referenced this pull request Jan 23, 2025
This commit adds the Filestream integration
qcorporation pushed a commit that referenced this pull request Feb 3, 2025
This commit adds the Filestream integration
harnish-elastic pushed a commit to harnish-elastic/integrations that referenced this pull request Feb 4, 2025
This commit adds the Filestream integration
qcorporation pushed a commit that referenced this pull request Feb 4, 2025
This commit adds the Filestream integration
harnish-elastic pushed a commit to harnish-elastic/integrations that referenced this pull request Feb 5, 2025
This commit adds the Filestream integration
@belimawr belimawr deleted the 5370-filestream-integration branch February 6, 2025 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Integration:filestream Custom Filestream Logs New Integration Issue or pull request for creating a new integration package. Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team [elastic/elastic-agent-data-plane]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Elastic Agent] Add a new Custom Filestream integration
7 participants