Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ingest Manager] Align namespace validation rules #75846

Closed
michalpristas opened this issue Aug 25, 2020 · 18 comments · Fixed by #78522
Closed

[Ingest Manager] Align namespace validation rules #75846

michalpristas opened this issue Aug 25, 2020 · 18 comments · Fixed by #78522
Assignees
Labels
Feature:Fleet Fleet team's agent central management project Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@michalpristas
Copy link

With namespace validation introduced here https://github.com/elastic/kibana/pull/75381/files
validation rules are not aligned between kibana and agent (https://github.com/elastic/beats/blob/f06bcc5d401d7b6fc833bc0e516064ace0a68d45/x-pack/elastic-agent/pkg/agent/application/filters/stream_checker.go#L173)

What is missing is

  • namespace is allowed to contain . but is not allowed to be equal to . or ..
  • namespace cannot be longer than 255 chars
  • namespace cannot start with -, _ or +

we should either add it to fleet validation or remove them from agent.
I took the rules from here: https://github.com/elastic/elasticsearch/blob/master/docs/reference/indices/create-index.asciidoc

cc @jen-huang

@michalpristas michalpristas added Feature:Fleet Fleet team's agent central management project Team:Fleet Team label for Observability Data Collection Fleet team labels Aug 25, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Feature:Fleet)

@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Team:Ingest Management)

@jen-huang
Copy link
Contributor

Consolidating discussion from https://github.com/elastic/kibana/pull/75381/files#r476228944:

I think the validation is incorrect,

Cannot be longer than 255 bytes (note it is bytes, so multi-byte characters will count towards the 255 limit faster)

This mean, type + dataset + namespace <= 255 bytes.

Originally posted by @ph in #75381 (comment)


what we do in agent is that we put 255 constraint on dataset, namespace and a product of concatenation, so eventaully we will fail when index > 255.
putting some not communicated constraint felt weird (such as 255/3 or some other math) so i used 255 on each part and result as well.
as namespace is created before configuration and can lead to different indexes based on dataset and type it would make sense to agree on some constraint which can be applied at the time of specifying namespace.

Originally posted by @michalpristas in #75381 (comment)

@jen-huang
Copy link
Contributor

jen-huang commented Aug 25, 2020

namespace is allowed to contain . but is not allowed to be equal to . or ..
namespace cannot start with -, _ or +

Namespace is the last part of our indexing strategy. These ES rules only apply to the complete index name. A namespace of . or .. will still generate a valid index name like logs-system.auth-. A namespace of + will generate logs-system.auth-+. Both of these are valid index names. Please correct me if I'm wrong.

namespace cannot be longer than 255 chars
what we do in agent is that we put 255 constraint on dataset, namespace and a product of concatenation, so eventaully we will fail when index > 255.

The 255 total index name size in bytes is the reason why I left out that original constraint in Kibana, as we have no way of knowing the size of the other parts of the indexing strategy at the time user enters the namespace. Also, length of string != number of bytes (English alphabet characters are one byte, but Chinese characters are multibyte). Because of these reasons, I don't think we can enforce a rule for namespace that is guaranteed to work across all cases. We can set up a business rule that will work for most cases, such as limiting namespace to 50 characters length maximum in UI and Agent. I doubt many users will reach the index name limit through natural use anyway (I think even 50 characters is more than enough), but enforcing a strict limit will provide some safety. What do you think of this rule?

@ph
Copy link
Contributor

ph commented Aug 25, 2020

Thanks @jen-huang for moving it

--

Good points, I think we should come up with a limit for every data_stream field, we have a spec, we can enforce that.

data_stream.type is a limited set (traces, logs, metrics, synthetics) 10 is max (20 bytes)
data_stream.dataset 255-50-20 -10 (buffer) = 75 bytes.
data_stream.namespace => 50 bytes seems like a good limit.

The above limits seems goods, assuming that most of theses will be ascii values, maybe namespace could have multibytes chars.

cc @ruflin WDYT?

@ph
Copy link
Contributor

ph commented Aug 25, 2020

Changed the maths above..

@ruflin
Copy link
Contributor

ruflin commented Aug 26, 2020

20 bytes for the prefix should definitively be enough. For the other two I'm torn. I can see potential uses cases where dataset can become long, others where the namespace is long. To keep things simple and have a buffer, what about having a limit of 100 for dataset and namespace. This would max out at 222 bytes in the worst case (we have 2 -).

BTW: I think I fail at your math: 255-50-20 -10 = (1)75 ?

@michalpristas
Copy link
Author

michalpristas commented Sep 2, 2020

i like the proposal with limits per segment.
20/175/50 sounds like enough for everything, if we agree on these numbers or ones proposed by nicolas i will create a PR updating our current rules

@jen-huang
Copy link
Contributor

Just to clarify, the 20/175/50 limits are bytes, not characters?

@ruflin
Copy link
Contributor

ruflin commented Sep 9, 2020

Yes, bytes. My preference is still on 20/100/100 ;-)

@jen-huang
Copy link
Contributor

Hi @michalpristas, I opened #78522 to implement 100 bytes limit on namespace following @ruflin's 20/100/100 proposal. I don't think merging this PR necessarily has to be synced with a corresponding agent PR, so this is just an FYI.

@michalpristas
Copy link
Author

no it does not, thanks for the ping

@michalpristas
Copy link
Author

@jen-huang any plans on putting on restrictions on type and dataset ?
same character set should be restricted, length limits and type should not start with -, _, + (unless we agree on list of accepted types)

@michalpristas
Copy link
Author

agent pr: elastic/beats#21406

@jen-huang
Copy link
Contributor

@michalpristas type and dataset aren't controlled by Kibana, just passed through from packages. @ruflin / @mtojek what do you think about adding validators on the packages side to limit the length of those fields?

@ruflin
Copy link
Contributor

ruflin commented Oct 8, 2020

++ on validating these already in the package-spec. @ycombinator WDYT?

@ycombinator
Copy link
Contributor

Agreed. Just make a new issue in the package-spec repo with the desired validation rules: https://github.com/elastic/package-spec/issues/new.

@ruflin
Copy link
Contributor

ruflin commented Oct 12, 2020

I quickly filed elastic/package-spec#57 Would be good to fill in more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Fleet Fleet team's agent central management project Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants