Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for simple type-aware partial evaluation. #95

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

emina
Copy link
Contributor

@emina emina commented Feb 5, 2025

Signed-off-by: Emina Torlak <torlaket@amazon.com>
Signed-off-by: Emina Torlak <torlaket@amazon.com>
Copy link
Contributor

@cdisselkoen cdisselkoen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in support of this change.

Comment on lines +387 to +402
// Entity data for Alice and SuperSecret document.
[
{
"uid": { "type": "User", "id": "Alice" },
"attrs": { },
"parents": [ ]
},
{
"uid": { "type": "Document", "id": "SuperSecret" },
"attrs": {
"isPublic": false,
"owner" : { "type": "User", "id": "Alice" }
},
"parents": [ ]
}
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit/technicality: wouldn't we need an entry for UnknownIP::"AliceIP" in the entity data (with unknown attributes)? Does this RFC propose that completely omitting an entity from the entity data is allowed, and equivalent to specifying all of its data as unknown? Are there footguns that could result?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking we would allow an entity to be fully omitted from the entity store, which would be treated equivalently as specifying that all of the entity’s data is unknown. This would be useful / convenient for evaluating policy conditions have constraints like User::”Alice” in principal.friends.

No footguns immediately come to mind, but I’ll keep thinking about it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also came to mind when I read this proposal, I wasn't sure at first if the omission was intentional or not. Maybe worth calling out explicitly in the proposal (right before the Permissions queries with TPE section) that there might be references to entities that are not given, and they are then treated as "fully unknown".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good; will do!

Co-authored-by: Craig Disselkoen <cdiss@amazon.com>
Signed-off-by: Emina Torlak <torlaket@amazon.com>
Signed-off-by: Emina Torlak <torlaket@amazon.com>

In JSON, we describe partial entities as a list of JSON objects whose known parts are specified using the [Cedar JSON format for entities](https://docs.cedarpolicy.com/auth/entities-syntax.html#entities).

For example, this partial entities list for our [basic schema](#basic-example) specifies the ancestors for the entity `Document::"Manual"` and leaves the attributes unknown:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the tags field is currently optional. Absence of tags means that it's empty. If we want to mark tags optional, we may have to use null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we don't need null for this in the interface (though we may in the implementation). That's because we can tell from the schema whether an entity type has tags or not. If it is declared to have tags, then omitted tags field means it's unknown---empty tags are just an empty map. If not, then tags must be absent anyway.

Copy link
Contributor

@shaobo-he-aws shaobo-he-aws Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say the schema is like this,

entity A tags Long;

And JSON entities,

Case 1

{ 
  "uid" : ...,
   "tags": []
}

Case 2

{ 
  "uid" : ...
}

Case 1 means tags is known and empty. Case 2 means tags is unknown according to the proposal but the current deserializer treats it like Case 1. So if we want to keep the proposal, it looks like a breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, interesting. I just assumed that the deserializer requires the tags field to be present (and empty) when it's declared in the schema.

Do we allow other kinds of absent fields to be interpreted as empty?

For example, if an entity declares no attributes, can you just omit attrs or are you supposed to provide the empty object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the tags field is different from all other fields. In other words, users have to provide fields parents and attrs anyways. @cdisselkoen to double check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdisselkoen I'm thinking that we'd still maintain backward compatibility with schema-based parsing if we require tags to be present if they are declared in the schema. That would be backward compatible since no schemas could have had tags declared before :) Or maybe I'm missing some subtlety with schema-based parsing too?

(I totally agree that tags need to be optional for regular entity parsin.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cedar 4.0.0-4.3.1 allow tags to be absent even if they are declared in the schema; this is equivalent to an empty tags map. Note that an empty tags map is perfectly legal even if the schema declares a type for tags for that entity. We should not break backward compatibility with the behavior of Cedar 4.0.0-4.3.1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that TPE would use schema-based parsing because we have and need the schema. The regular parser would be unaffected (it wouldn't need to handle partial inputs).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment applies to schema-bsaed parsing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something but so far, it seems to me that schema-based parsing specifically wouldn't need to be broken. We'll need to change the schema-based parser anyway to support partial inputs. So, when the input is partial (as opposed to fully known), we interpret missing tags as unknown for a type that has tags declared.

All that said, I don't feel strongly about how we surface the TPE interface in JSON :). Open to suggestions for whatever is least invasive.

Copy link
Contributor

@shaobo-he-aws shaobo-he-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Signed-off-by: Emina Torlak <torlaket@amazon.com>
@luxas
Copy link

luxas commented Feb 26, 2025

Really excited to see this 💯 🎉 !! This would help Kubernetes integration quite a bunch 👍 (see: awslabs/cedar-access-control-for-k8s#11).

In particular, I proposed to @micahhausler we'd use partial eval to be able to use only one unified action (e.g. "just" k8s::Action::"create", not both that and k8s::admission::Action::"create" like now), first with partial data (without any populated object) and later with object populated.

This would rely directly on the special "feature" described in this proposal that instead of mixing unknown and known attributes within one entity, unknown attributes are factored out to its own typed entity to be populated later.

This is enough for this use case (I think), we can embed this semantic directly into the schema at design time, no need for the CPE to TPE conversion layer.

I have one question: Will we remove the unknown function as a result, or make it result in an error in future versions of Cedar, when this is implemented? Today, one can e.g. populate an attribute value with the __extn JSON semantic; I guess we'd want the TPE to error if it saw partially populated attributes like that in the future.

Finally, did you consider already the interaction between this feature and entity slicing, and/or what the practical Rust interface would look like for reauthorizing a residual?

@shaobo-he-aws
Copy link
Contributor

Finally, did you consider already the interaction between this feature and entity slicing, and/or what the practical Rust interface would look like for reauthorizing a residual?

My 2 cents. TPE should be compatible with the existing methods to perform entity slicing (i.e., entity manifest based and level based), since they all require validation.

I can't say for the Rust implementation. The Lean model asks the user to provide fully concrete requests and entities (i.e., those who do not contain any unknowns) to reauthorize a residual. And these requests and entity stores must be consistent with the partial requests and entities, which TPE checks. I imagine that Rust will provide a simpler interface but it will be tricky because there do not exist named unknowns in TPE, preventing us to adopt a similar interface to the existing PE that is based on bindings.

@emina
Copy link
Contributor Author

emina commented Feb 26, 2025

Really excited to see this 💯 🎉 !! This would help Kubernetes integration quite a bunch 👍 (see: awslabs/cedar-access-control-for-k8s#11).

Thank you for reading and commenting! This is super helpful to know :)

I have one question: Will we remove the unknown function as a result, or make it result in an error in future versions of Cedar, when this is implemented? Today, one can e.g. populate an attribute value with the __extn JSON semantic; I guess we'd want the TPE to error if it saw partially populated attributes like that in the future.

That's a good question. Since unknown was part of the current experimental PE feature, I think that it's safe to remove it without officially breaking backward compatibility. But I defer to the Cedar team on that one.

Finally, did you consider already the interaction between this feature and entity slicing, and/or what the practical Rust interface would look like for reauthorizing a residual?

This should work with entity slicing, with minor adjustments. We'd need to adjust the entity slicing algorithm to work on residual expressions as well as typed expressions (its current input). Residual expressions are closely related to typed expressions so this is definitely doable.

…owns.

Signed-off-by: Emina Torlak <torlaket@amazon.com>
@luxas
Copy link

luxas commented Mar 5, 2025

Another thing I came to think about, could we add a concrete example of how to authorize partial ancestors too?

Consider this schema and policy:

namespace partialancestors {
    entity Organization;
    entity Team in [Organization];
    entity User in [Team, Organization];
    entity Document;
    
    action "get" appliesTo {
        principal: User,
        resource: Document,
    };
}
permit (
    principal,
    action == partialancestors::Action::"get",
    resource is partialancestors::Document
) when {
    principal in partialancestors::Organization::"org" &&
    principal in partialancestors::Team::"intermediate" &&
    principal in partialancestors::Team::"toplevel"
};

Then, I perform a request for principal User::"lucas", which is known to have two ancestors: Organization::"org" and Team:"a", but any other team membership is unknown (i.e. whether the user is part of team b)

Given entities

[
    {
        "uid": {
            "type": "partialancestors::User",
            "id": "lucas"
        },
        "attrs": {},
        "parents": [
            {
                "type": "partialancestors::Team",
                "id": "a"
            },
            {
                "type": "partialancestors::Organization",
                "id": "org"
            }
        ]
    },
    {"uid": { "type": "partialancestors::Team", "id": "lucas"}},
    {"uid": { "type": "partialancestors::Organization", "id": "org"}}
]

it is known that the first two conditions are met, but not whether the third one holds. If the requester wants to indicate that parents of lucas are in fact partially unknown, I guess one would perform a schema rewrite such that there is an "intermediate parent layer" in the schema, e.g.

entity UserPartialParents in [Team, Organization];
entity User in [Team, Organization, UserPartialParents];

then still keep lucas' parents known, but refer to an entity that is unknown (not in the entity store), like this:

[
    {
        "uid": {
            "type": "partialancestors::User",
            "id": "lucas"
        },
        "attrs": {},
        "parents": [
            {
                "type": "partialancestors::Team",
                "id": "a"
            },
            {
                "type": "partialancestors::Organization",
                "id": "org"
            },
            {
                "type": "partialancestors::UserPartialParents",
                "id": "lucas"
            }
        ]
    },
    {"uid": { "type": "partialancestors::Team", "id": "lucas"}},
    {"uid": { "type": "partialancestors::Organization", "id": "org"}}
]

Now, Cedar cannot evaluate User::"lucas" in partialancestors::Team::"toplevel", because partialancestors::UserPartialParents::"lucas" does not exist, and could have a parent called partialancestors::Team::"toplevel" (in a transitive closure).

Was this kind of rewrite what you had in mind for ancestors/parents?

@emina
Copy link
Contributor Author

emina commented Mar 5, 2025

Another thing I came to think about, could we add a concrete example of how to authorize partial ancestors too?

Was this kind of rewrite what you had in mind for ancestors/parents?

That is an interesting question. We haven't considered supporting partial ancestors, since this is not supported by the current PE either. The transformation you suggest seems like it should work, but I'll need to think more about it.

@emina
Copy link
Contributor Author

emina commented Mar 7, 2025

The transformation you suggest seems like it should work, but I'll need to think more about it.

Coming back to this idea, the suggested transformation wouldn’t work. That’s because the partial evaluator, like the regular evaluator, represents the entity hierarchy as a map from an entity to its ancestors (rather than parents). That is, the input to the (partial) evaluator is the transitive closure of the hierarchy graph. In JSON, we let you specify just the direct parents for convenience—so you don’t have to compute the transitive closure and pass it in. But we do compute the transitive closure over the JSON graph before passing the final map from entities to ancestors to the (partial) evaluator. This way, the (partial) evaluator can check A in B in constant time, simply by testing if B is a member of A’s ancestors set (and without having to perform a reachability check over the graph).

So, this actually reveals a restriction that we need to put on the JSON partial input format: entities with unknown parents must not appear in the “parent” field of any other entity.

With this in mind, there is a more aggressive transformation that would let us express something like this.

First, change the schema to add UnknownUserAncestors like this:

entity UnknownUserAncestors in [Team, Organization];
entity User in [Team, Organization] {
  unknownAncestors: UnknownUserAncestors
};

Next, transform the policy as follows:


permit (
    principal,
    action == partialancestors::Action::"get",
    resource
) when {
    // We don’t know if Lucas is in Team “foo” or “topLevel”
    principal.unknownAncestors in partialancestors::Team::"foo" && 
    principal in partialancestors::Organization::"org" &&
    principal in partialancestors::Team::"intermediate" &&
    principal.unknownAncestors in partialancestors::Team::"toplevel"
};

Finally, invoke the partial evaluator with the following partial input:

[
    {
        "uid": {
            "type": "partialancestors::User",
            "id": "lucas"
        },
        “attrs”: {
            “unknownAncestors” : { “type” : “UnknownUserAncestors”, “id” : “LucasUnknownAncestors” }
         },
        "parents": [
            {
                "type": "partialancestors::Team",
                "id": "a"
            },
            {
                "type": "partialancestors::Organization",
                "id": "org"
            }
        ]
    },
   // Provide the parents for partialancestors::Organization::”org”, partialancestors::Team::”a” and their parents 
   // too, if any. 
]

Given this input, TPE will return the following residual:

permit (
    principal,
    action,
    resource
) when {
    UnknownUserAncestors::”LucasUnknownAncestors” in partialancestors::Team::"foo" &&
    UnknownUserAncestors::”LucasUnknownAncestors” in partialancestors::Team::"toplevel"
};

The alternative to all this would be for the partial evaluator to work directly on the parent relation, rather than the ancestor relation like the evaluator. Then, it would be ok for an entity to have a parent whose parents themselves are unknown. But it would be very difficult to establish the correctness of this algorithm with respect to the concrete evaluation semantics (which works on the transitive closure).

@shaobo-he-aws Heads up this discussion :). I initially thought this would affect proofs, but actually maybe not, because nothing in the semantics itself assumes that the input ancestor relation is transitive. The evaluator will quite happily work on an ancestor relation that is not transitive. But the above restriction on the JSON input must still be enforced to uphold the implied relation between “parents” in the JSON input and “ancestors” in the (partial) evaluator input.

@luxas
Copy link

luxas commented Mar 7, 2025

But we do compute the transitive closure over the JSON graph before passing the final map from entities to ancestors to the (partial) evaluator.

Oh yeah true, that makes sense from an efficiency perspective 👍

Thanks for the other rewrite example, makes sense to me, and seems to be round-trippable if one made an intermediate transformation layer, so that if the user was unaware of all of this, the return transformation of the above would then be:

permit (
    principal,
    action,
    resource
) when {
    partialancestors::User::”lucas” in partialancestors::Team::"foo" &&
    partialancestors::User::”lucas” in partialancestors::Team::"toplevel"
};

even though the TPE stayed as proposed 👍

(And if this would be required in the beginning, I can most likely just interface with the TPE directly with a generated schema with rewrites applied directly)

Glad that this use-case is supported too then with the given plan 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants