Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Saved Objects] create/bulkCreate: add validation for custom ids #105039

Closed
pgayvallet opened this issue Jul 9, 2021 · 9 comments · Fixed by #187876
Closed

[Saved Objects] create/bulkCreate: add validation for custom ids #105039

pgayvallet opened this issue Jul 9, 2021 · 9 comments · Fixed by #187876
Labels
Feature:Saved Objects Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@pgayvallet
Copy link
Contributor

pgayvallet commented Jul 9, 2021

We're not currently performing any kind of validation when the consumer provides custom ids when invoking the create or bulkCreate APIs. We're just generating a random id if the value was not provided (undefined for create and undefined or null for bulkCreate)

const {
id = SavedObjectsUtils.generateId(),

if (id == null) {
object.id = SavedObjectsUtils.generateId();
}

The SO serializer does accept empty ids when converting the SO to its raw format, but will fail to do the opposite, during the prefix check, as the id is empty.

function checkIdMatchesPrefix(id: string, prefix: string) {
return id.startsWith(prefix) && id.length > prefix.length;
}

This leads to data corruption, as creating an object with an empty id will fail to deserialize the doc, causing errors at runtime when accessing it, and during the SO migration.

So the primary goal is to forbid usage of empty ids. But while we're at it, I think we should add more complete validation against custom ids when provided, to enforce an allowed pattern.

I tried to look at our documentation to see if we already have a pattern for custom ids, but AFAIK we don't. Part of resolving the issue would therefore be to define the pattern we do want to support.

@pgayvallet pgayvallet added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Saved Objects labels Jul 9, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@pgayvallet
Copy link
Contributor Author

cc @jportner Do you know what characters should be allowed for custom ids? I just know that it wouldn't be as simple as restricting to alphanum, as if I remember correctly, some special chars such as : are allowed.

Also

  • what should be the max length of custom ids?
  • Do we want to enforce that ids starts with an alpha (or alphanum) char?

@jportner
Copy link
Contributor

jportner commented Jul 9, 2021

So the primary goal is to forbid usage of empty ids.

Whoops! Makes sense 👍

cc @jportner Do you know what characters should be allowed for custom ids? I just know that it wouldn't be as simple as restricting to alphanum, as if I remember correctly, some special chars such as : are allowed.

We don't currently implement any restrictions, and if we did it would potentially be a breaking change. I think the only limiting factor is what the _id field in Elasticsearch supports, and the docs only indicate that it is limited to 512 bytes in size.

I did some rudimentary testing with dev tools, and the only characters that seem to cause problems when creating a single new document are / and \. Perhaps it's worth asking the ES team though.

  • what should be the max length of custom ids?

So, if 512 bytes is our absolute upper limit, and we also potentially have a space ID in the raw ES document (for legacy single-namespace types), perhaps we should limit space IDs and SO IDs each to 224 bytes in UTF-8 encoding (448 total). That gives us plenty of buffer for the other parts of the raw document ID, like the colon delimiters and the SO type.

  • Do we want to enforce that ids starts with an alpha (or alphanum) char?

No, I think that's an arbitrary limitation. Users with other native languages may want to use different characters like 鸡 😄

Edit: asked the ES team and confirmed that the only limitation on the ES side is that _id field must be <= 512 bytes (UTF-8 encoding -- each character can take anywhere from 1 to 4 bytes). Since we have different raw document ID prefixes depending on the object type and what space the user is in, maybe we should just verify that ES throws an appropriate error for exceeding length, and verify the error is surfaced by the SOC, and use that instead of attempting to validate ID length on the Kibana side.

@pgayvallet
Copy link
Contributor Author

pgayvallet commented Jul 12, 2021

After a sync discussion with @jportner (and even if I just hate the idea of allowing the whole UTF-8 range in our ids), we came to the conclusion that ensuring a pattern was going to be an issue for BTW, as it would cause importing docs with 'now invalid' ids to no longer be possible.

The scope of the issue will then only be to forbid empty ids (either null or empty string) and to throw an error in such case, as asserting against a correct length will be performed on the ES side anyway.

@mshustov
Copy link
Contributor

@pgayvallet Should we extend the scope with on read validation as we proposed in the SO-migV2-summary doc?

@pgayvallet
Copy link
Contributor Author

pgayvallet commented Jul 14, 2021

For empty ids, any read attempt currently leads to an error when converting the raw doc.

I would be very in favor of defining an actual allowed character set for ids (allowing the full UTF-8 range in what is supposed to be a technical ID is an aberration in my opinion), and if we do, starting by logging warnings on read would probably be the way to go, however AFAIK, these ids are not only technical, e.g for spaces, this is the actual used-inputed 'name' and path of the space, as @jportner pointed it out...

@mshustov
Copy link
Contributor

however AFAIK, these ids are not only technical, e.g for spaces, this is the actual used-inputed 'name' and path of the space, as @jportner pointed it out

We can encode user-inputted values to make sure only valid UTF-8 characters present.

@pgayvallet
Copy link
Contributor Author

We can encode user-inputted values to make sure only valid UTF-8 characters present

The storage is not really the issue. ES supports that properly. it's more that imho a technical id is plain ascii, and that we discovered that what we call 'id` is effectively used as name or functional id, which seems like quite a poor design

@pgayvallet
Copy link
Contributor Author

Scope of the issue reduced to only ban empty ids (empty strings). Having stricter rules on the id format would be considered a breaking change

pgayvallet added a commit that referenced this issue Jul 11, 2024
…ns (#187876)

## Summary

Fix #123575
Fix #105039

This PR does two things:
- adapt SO ID validation to block empty strings (`""`), we we were
already doing with `undefined`
- add validation of the `attributes` to reject primitives and
`undefined` (only accept objects)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Saved Objects Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants