-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds validations for Saved Object types when calling create
or bulkCreate
.
#118969
Conversation
1848c92
to
8833c88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self-review
* '2.1.0': (data) => { | ||
* if (typeof data.bar !== 'string') { | ||
* throw new Error(`[bar]: expected value of type [string] but got [${typeof data.bar}]`); | ||
* } | ||
* if (typeof data.foo !== 'string' && typeof data.foo !== 'boolean') { | ||
* throw new Error(`[foo]: expected value of type [string,boolean] but got [${typeof data.foo}]`); | ||
* } | ||
* } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it would be a good idea to allow for custom validation functions, similar to what we do for http routes, because it feels like an inevitability that someone would eventually ask to do something that can't be handled by @kbn/config-schema
.
However, implementing the custom functions just raised more questions that I'd like to get others' thoughts on:
- IMO a validation function isn't the right place for folks to actually change the contents of the objects, so for now I structured the validation functions differently from how they are set up for http routes... I didn't include a factory for returning
ok(value)
orbadRequest()
. It's just a function that exists to throw errors, and otherwise doesn't return. It doesn't let you modify or do anything with the data.- Is it worth trying to follow the same factory pattern for consistency and keeping safeguards to prevent mutation, or do we prefer this simpler approach?
- Do we think users need access to anything other than the attributes? Currently the SO repository just passes attributes in. But there may be situations for custom functions where it is useful to have more info about the object.
- A more future-proof alternative might be providing
({ attributes })
to the validation functions, so that we can add more later if it proves to be necessary, WDYT?
- A more future-proof alternative might be providing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have an opinion on (1) yet.
(2) I think we should rather pass in the entire saved object, would there be any downside? Might not be a strong use case, but theoretically you could want to do validation on the references e.g. a dashboard can't have references to a config
saved object type or on the updated_at
field, e.g. can't set a date in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth trying to follow the same factory pattern for consistency and keeping safeguards to prevent mutation, or do we prefer this simpler approach
Ihmo let's KISS for now, and just document that custom validation function should not be used to mutate the data. If at some point we discover that API consumers are doing, or planning to do, so, we can then put real safeguards.
(2) I think we should rather pass in the entire saved object, would there be any downside?
I agree that it would probably be helpful to allow owners to validate more than the attributes, BUT this means that custom validation functions would be more powerful than schema validation, which is an idea I hate to be honest, as ideally config-schema validation should always be favored unless there is a very specific reason to use custom validation instead. This is what we did for route validation for instance.
So if we think that it make sense to validate more than the attributes (which I think I agree on), we should find a way to do so also when using the by-schema validation? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I agree. I think the kind of validation I describe is related to the business domain and should ideally live in a layer one higher than saved objects, e.g. a service exposed to other plugins or in the http controller. In such a layer you could do async validation like fetching other saved objects before accepting a create.
So an alternative could be to make sure we have schema validation in core for "root properties" like references and updated_at and then keep only validation on attributes. And then if someone needs more powerful validation the answer is "build your own http API" like we anyway already recommend.
I was thinking when all saved objects are hidden: true
saved object validation might become obsolete, but we would always need import/export validation and this would skip any business logic in other layers. I still feel a bit uneasy about import, you wouldn't always apply the same validation as on a create since the import includes other related objects, and it becomes quite messy to enforce the same business logic on an HTTP API as on a batch of imported objects. One potential solution could be to "sign" exports with e.g. a hash of it's contents so that we know it wasn't tampered with and can safely import it, but it also removes the usefulness of this API to some extent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So an alternative could be to make sure we have schema validation in core for "root properties" like references and updated_at and then keep only validation on attributes. And then if someone needs more powerful validation the answer is "build your own http API" like we anyway already recommend.
That would be aligned with our recommendations, so I'd say it make sense.
I was thinking when all saved objects are hidden: true saved object validation might become obsolete, but we would always need import/export validation and this would skip any business logic in other layers
When all types will have their own validation layer directly implemented in their services / endpoints, I think that import/export would be the only place remaining owned by core
where we'll want to perform validation yes. Once this is the case, I imagine we could still allow owners to register a specific validation function / hook for this with their types, to be used specifically when validating during import or export?
One potential solution could be to "sign" exports with e.g. a hash of it's contents so that we know it wasn't tampered with and can safely import it, but it also removes the usefulness of this API to some extent.
I think quite a few consumers are currently tampering or manually creating export files, so I'm not sure if we could introduce such thing without it being a breaking change. Plus, given that I think we'll want validation for import/export regardless, I'm not sure this would bring a significant plus value, as least for validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ihmo let's KISS for now, and just document that custom validation function should not be used to mutate the data.
Sounds good, I hate the idea of someone abusing this as a hook to rewrite attributes, but I also think a wait-and-see approach is reasonable here.
So an alternative could be to make sure we have schema validation in core for "root properties" like references and updated_at and then keep only validation on attributes
This sounds like a good middle ground, but to keep us future proof if we change our minds, I'll still update the custom validation functions to take an object with a single attributes
key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and added a full schema validation for the "root properties" that gets applied as well, so we are now validating the whole object, not just attributes.
|
||
private validateFunction(data: A, validateFn: SavedObjectsValidationFunction) { | ||
try { | ||
validateFn({ ...data }); // shallow clone to prevent mutation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in my other comment, maybe this should be future-proofed?
validateFn({ ...data }); // shallow clone to prevent mutation | |
validateFn({ attributes: { ...data } }); // shallow clone to prevent mutation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to future-proofing from the get go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shallow clone does not really prevent mutation, e.g
const data = {
foo: {
bar: 12
}
};
const shallow = { ...data };
shallow.foo.bar = 7;
data.foo.bar // <== 7
(and having attributes with nested properties is quite common)
So as I said on a previous comment, I think we should either KISS and do nothing for now, or do it properly via a proper deepCopy or another mutation-prevention pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shallow clone does not really prevent mutation
Yeah I know, I was trying to hit a middle ground between the perf hit of a deep clone while still providing some protection against mutation. But point taken about nested properties being common.
For now I'll go with KISS as discussed in the thread above. If we spot anyone abusing this, we can come back and deep clone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I know, I was trying to hit a middle ground between the perf hit of a deep clone while still providing some protection against mutation.
The problem with this approach that is implicit. validateFn({ attributes: Object.freeze(data) } });
will throw an error, so Kibana devs can see the mutations are explicitly forbidden.
create
or bulkCreate
.
Pinging @elastic/kibana-core (Team:Core) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it's looking good as an initial implementation. I still wish we could figure out how to add validation against updates
though.
|
||
private validateFunction(data: A, validateFn: SavedObjectsValidationFunction) { | ||
try { | ||
validateFn({ ...data }); // shallow clone to prevent mutation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to future-proofing from the get go
/** Validate a migrated doc against the registered saved object type's schema. */ | ||
private validateObjectAttributes<T>(type: string, doc: SavedObjectSanitizedDoc<T>) { | ||
const savedObjectType = this._registry.getType(type); | ||
if (!savedObjectType?.schemas) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we log a warning if there isn't a validation schema declared? That way, we'll have some way of monitoring migration failures (or any other failures from malformed SO creation actions) over time and address the critical ones that cause issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do that, we need to enable it in dev mode only, as it's a developer warning and not a production one.
It could be very verbose though, and I'm not sure it would replace the usual meta issue pinging all the type owners anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be very verbose though, and I'm not sure it would replace the usual meta issue pinging all the type owners anyway?
Verbosity is what we want in debug mode but you're right, we don't know if it will help in the long run or be more "noise" than we can reasonably sift through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we going to track the SO without validation? How soon do we want to get other teams to cover SO with validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that with the latest updates to this PR we are now performing validation on the whole SO regardless of whether any validations have been registered for the attributes, IMO it is okay to do the usual pinging of type owners so that they can decide how to prioritize adding their own validations for the actual attributes. I think for some teams this will likely be a much larger concern than for others, so I don't think we necessarily need to put a timetable on it yet...
/** Validate a migrated doc against the registered saved object type's schema. */ | ||
private validateObjectAttributes<T>(type: string, doc: SavedObjectSanitizedDoc<T>) { | ||
const savedObjectType = this._registry.getType(type); | ||
if (!savedObjectType?.schemas) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do that, we need to enable it in dev mode only, as it's a developer warning and not a production one.
It could be very verbose though, and I'm not sure it would replace the usual meta issue pinging all the type owners anyway?
const validator = new SavedObjectsTypeValidator({ | ||
type, | ||
validationMap: savedObjectType.schemas, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT/optional: do we really need to instantiate this on every call to validateObjectAttributes
? can't we instantiate a validator per type when the repository is created?
* '2.1.0': (data) => { | ||
* if (typeof data.bar !== 'string') { | ||
* throw new Error(`[bar]: expected value of type [string] but got [${typeof data.bar}]`); | ||
* } | ||
* if (typeof data.foo !== 'string' && typeof data.foo !== 'boolean') { | ||
* throw new Error(`[foo]: expected value of type [string,boolean] but got [${typeof data.foo}]`); | ||
* } | ||
* } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth trying to follow the same factory pattern for consistency and keeping safeguards to prevent mutation, or do we prefer this simpler approach
Ihmo let's KISS for now, and just document that custom validation function should not be used to mutate the data. If at some point we discover that API consumers are doing, or planning to do, so, we can then put real safeguards.
(2) I think we should rather pass in the entire saved object, would there be any downside?
I agree that it would probably be helpful to allow owners to validate more than the attributes, BUT this means that custom validation functions would be more powerful than schema validation, which is an idea I hate to be honest, as ideally config-schema validation should always be favored unless there is a very specific reason to use custom validation instead. This is what we did for route validation for instance.
So if we think that it make sense to validate more than the attributes (which I think I agree on), we should find a way to do so also when using the by-schema validation? wdyt?
|
||
private validateFunction(data: A, validateFn: SavedObjectsValidationFunction) { | ||
try { | ||
validateFn({ ...data }); // shallow clone to prevent mutation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shallow clone does not really prevent mutation, e.g
const data = {
foo: {
bar: 12
}
};
const shallow = { ...data };
shallow.foo.bar = 7;
data.foo.bar // <== 7
(and having attributes with nested properties is quite common)
So as I said on a previous comment, I think we should either KISS and do nothing for now, or do it properly via a proper deepCopy or another mutation-prevention pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay - the last week has been crazy, but was finally able to pick this back up.
|
||
private validateFunction(data: A, validateFn: SavedObjectsValidationFunction) { | ||
try { | ||
validateFn({ ...data }); // shallow clone to prevent mutation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shallow clone does not really prevent mutation
Yeah I know, I was trying to hit a middle ground between the perf hit of a deep clone while still providing some protection against mutation. But point taken about nested properties being common.
For now I'll go with KISS as discussed in the thread above. If we spot anyone abusing this, we can come back and deep clone.
* '2.1.0': (data) => { | ||
* if (typeof data.bar !== 'string') { | ||
* throw new Error(`[bar]: expected value of type [string] but got [${typeof data.bar}]`); | ||
* } | ||
* if (typeof data.foo !== 'string' && typeof data.foo !== 'boolean') { | ||
* throw new Error(`[foo]: expected value of type [string,boolean] but got [${typeof data.foo}]`); | ||
* } | ||
* } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ihmo let's KISS for now, and just document that custom validation function should not be used to mutate the data.
Sounds good, I hate the idea of someone abusing this as a hook to rewrite attributes, but I also think a wait-and-see approach is reasonable here.
So an alternative could be to make sure we have schema validation in core for "root properties" like references and updated_at and then keep only validation on attributes
This sounds like a good middle ground, but to keep us future proof if we change our minds, I'll still update the custom validation functions to take an object with a single attributes
key.
@elasticmachine merge upstream |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation looking good. A bunch of comments and NITs, but overall LGTM
src/core/server/saved_objects/validation/integration_tests/validator.test.ts
Outdated
Show resolved
Hide resolved
const validationSchema = createSavedObjectSanitizedDocSchema(validationRule); | ||
validationSchema.validate(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT/optional: are we going to use a SavedObjectsTypeValidator
to validate multiple objects? if so, we could eventually keep the instanciated per-version schemas to avoid reinstantiating them for each validation? Not sure if the perf increase is very significant though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think at least caching the validator per-type would be useful. Will do a small follow-up with this
@elasticmachine merge upstream |
💚 Build Succeeded
Metrics [docs]Public APIs missing comments
History
To update your PR or re-run it, just comment with: cc @lukeelmers |
Closes #104088
Summary
This adds a
schemas
option toSavedObjectType
that will allow consumers to provide a validation schema for the attributes of their saved object type. This schema is provided per-version similar to migration functions.This is not a full implementation of SO type validation; there are a few caveats:
create
andbulkCreate
ATM, because forupdate
operations users are not guaranteed to provide all required fields, which makes strict validation much more difficult. More discussion on this in the original issue.create
andbulkCreate
, SO repository does an on-the-fly migration on the object as a convenience. Since objects could be coming from any version, we decided to only run validations after the migrations are complete in this initial implementation.kibanaVersion
.A more complete solution would be to run validations from within the migration algorithm itself so that each document could be validated before it is passed to the migration function for the current version... But this is something that would be much longer term as it would change the current behavior of migrations which are permissive and can selectively ignore invalid data: #104088 (comment)
Plugin API Changes
Saved Objects now support optional validation schemas using
@kbn/config-schema
.When a validation schema is provided during type registration, it will be used to validate subsequent calls to the
SavedObjectClient
'screate
andbulkCreate
APIs. Validation schemas are not currently enforced onupdate
operations.We recommend adding a new validation schema any time attributes are added, removed, or changed in your Saved Object type, as they help protect data integrity by preventing users from inadvertently importing malformed objects.
Usage example: