Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reapplying changes from tokens branch #4949

Closed
wants to merge 3 commits into from
Closed

Conversation

steiza
Copy link

@steiza steiza commented Oct 27, 2018

Fixes #994.

I accidentally hosed #4599 when attempting to pull upstream changes / fix merge commits, so I'm taking the changes there and re-applying them without the merge commit.

This was referenced Oct 27, 2018
@di di requested review from di, dstufft and ewdurbin October 27, 2018 16:14
Copy link
Member

@dstufft dstufft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through and identified the issues that I currently see.

For what it's worth, I started working on a proof of concept for Macaroons a few months back, but I never got around to finishing it. I pushed that up to my fork and made a PR against master on my fork to make it easy to see how I structured that proof of concept.

You can find that at dstufft#2.

You don't have to use that proof of concept but you're free to use it if you wish. If I remember correctly, it was working just fine, I just never actually got around to implementing caveats (I got side tracked around that point). I included my idea for a few different ways that those first party caveats could be implemented though. I'm including it here just in case it might be helpful!

My PoC did not include the idea of limiting what routes are allowed to authenticate using a Macaroon, I think that is a pretty good way of limiting the impact that a Macaroon can possibly have, so I think either way we should definitely keep that.

if macaroon.identifier != request.registry.settings["account_token.id"]:
return None

if macaroon.location != "pypi.org":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this check is achieving much here, IIRC the location isn't part of the secure context of the Macaroon, so at best it's just making it so that you can't pass an invalid macaroon in... which isn't already possible since they won't be valid.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! According to https://ai.google/research/pubs/pub41892 this field is optional and (I just learned) even left out of the integrity checks. I'll remove this.

I've been learning more about macaroons since I first heard about them at PyCon 2018...


# First, check identifier and location
if macaroon.identifier != request.registry.settings["account_token.id"]:
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should have a single Macaroon key for all of Warehouse. If for some reason that key ever gets compromised, that makes it really painful to need to roll that key (it will invalidate all of the current Macaroons). It also doesn't allow someone to delete or blacklist a macaroon once produced, making it so that the authority provided by a Macaroon can never be revoked.

A far better idea I think here is to make it so that we generate a key per Macaroon that we produce and store that in the database. Revoking a macaroon is then as simple as deleting a row from the database, users can have multiple macaroons given to different services and can easily revoke them. Warehouse itself can still mass revoke all Macaroons of course, but we should hopefully never require that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can definitely have a per-macaroon secret instead of one for all of Warehouse; that's a good best-practice.

It's a minor point, but in this implementation it is possible for a user to generate multiple macaroons and revoke a particular macaroon by deleting its row from the database. After we verify the macaroon, we use the accounts_token table to map the macaroon's id to a username, so if we remove that row the macaroon will no longer be usable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I noticed that later on, and then just forgot to update this review :)

if request.matched_route.name not in self._routes_allowed:
return None

account_token = request.params.get("account_token")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not support authentication coming from a query string, this should be something that can't easily be accidentally leaked within an URL.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point!

# Check the macaroon against our configuration
verifier = Verifier()

verifier.third_party_caveat_verifier_delegate = IgnoreThirdPartyCaveats()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we ignoring third party caveats? If a user wants to attach a third party caveat to their Macaroon, we should be more than happy to verify it. If I remember correctly, a third party caveat only adds extra complexity on the client side, for the server side verifying them is pretty easy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the name I chose here might be confusing. IgnoreThirdPartyCaveats just returns True for verifying any third-party caveat (since we don't know how those caveats are structured). This does allow macaroons to include third-party caveats, which we ignore (in the sense that they will not cause validation to fail).


# If caveats are specified multiple times, only trust the
# first value we encounter.
if caveat_key == "id" and account_token_id is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is wrong I think, we shouldn't only trust the first value we encounter, because a key property of Macaroons is that given a Macaroon m1, you can further restrict the scope of the authority granted by that Macaroon by applying additional caveats and generating a new macroon, m2.

This applies to all caveats, for the id specifically, since I think that specifically the user, providing multiple id should basically be an error, because you're creating a macaroon that has caveats that are mutually exclusive.

For package_list, if we have two caveats that add {Django, requests} and {Django, Pyramid}, then the only package we should allow this macaroon to operate on is {Django}.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was another instance of me not being that familiar with macaroons back at PyCon 2018. I'll update this.

Like you mention below, we might need to pull some of the view logic (like getting the package name) here so we can properly validate the macaroon.

package_list = caveat_value

if package_list is not None:
request.session["account_token_package_list"] = package_list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of stuffing this information into the request session and then manually checking it in the view. It's bypassing the Pyramid authorization framework, and I feel like it will make it easier to either forget to check it (or worse, check only the Macaroon but not the Pyramid authorization framework).

I feel pretty strongly that our Macaroon support should work within the Authentication and the Authorization APIs that Pyramid provides. Unfortunately this isn't the cleanest thing in the world, since the Authorization framework doesn't have any direct access to the request, but you can work around that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense. I'll try this approach (and hopefully it doesn't punch too many holes in the abstraction between the view and the authorization layer).


return login_service.find_userid_by_account_token(account_token_id)

except (struct.error, MacaroonException):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What can raise a struct.error here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pymacaroons uses struct to do binary deserialization. If you pass a malformed macaroon it can generate a struct.error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, gotcha. That sucks they let that leak out. It'd be nice if there was a comment explaining that so we don't forget (and a test that triggers that behavior if there isn't one already).

self._authenticate = authenticate
self.callback = self._auth_callback

self._routes_allowed = ["forklift.legacy.file_upload"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be passed in as a value to the __init__ function here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!


def _validate_first_party_caveat(self, caveat):
# Only support 'id' and 'package_list' caveat for now
if caveat.split(": ")[0] not in ["id", "package_list"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to structure our caveats differently here as well as add some additional ones.

Off the top of my head:

  • Caveats should be able to restrict the projects that a macaroon is good for.
    • This could be further generalized to a caveat should be able to restrict the resources that a macaroon is good for. This could let you upload for specific files, releases, etc.
  • Caveats should be able to be limited in terms of what permissions they grant.
  • Caveats should he able to be limited in validity, ideally with both not before, and not after, and we should have a maximum window that a temporarily limited macaroon can be valid for, to reduce the chance of bugs.

A possibly interesting one, would be to combine the first two so that you could have a single macaroon that granted different permissions to different resources. Although there is a balance to be made in the simplicity of the caveat mini language and the complexity of what we enable people and tools to implement so it's possible that making it possible to have different permissions for different resources is too much complexity for too little gain.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is the most interesting part of the project! There are so many possibilities.

... but I'm also interested in what the minimum requirements are so we can establish this framework (while supporting the future use cases we might want to implement). And of course, whatever is implemented has to be somewhat easy to document and describe.

With that in mind, how about something like:

  • valid_before: in '%Y-%m-%d %H:%M:%S%z' format?
  • valid_after: same format
    • If both are specified, and the window is more than 2-3 years (taking a cue from TLS certificates), complain? Of course, by default these tokens will not be limited.
  • package_list: this is where it gets complicated. I think the requirements are:
    • We want to be able to specify multiple packages / versions / actions in one caveat, otherwise that caveat won't validate.
    • We don't want version specification and action to be optional (defaulting to all)
    • We want to be able to further restrict a macaroon with additional caveats later
    • So maybe:
      • A package_list is made of a space-separated list of package_items
      • A package_item has a required package name (canonicalized?)
      • A package_item is optionally followed by == and a version string
      • A package_item is optionally followed by a | and package actions (like upload?)
    • For this diff, we'll just implement the package name and skip the version strings and actions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The valid_(before|after) syntax seems reasonable. I'm not sure if we want to support all the way out to 2-3 years or not, but ideally that should be a constant defined somewhere we can tweak, so nailing down the exact answer right now isn't super important I don't think.

For package_list, I'm torn. Stepping back a minute, I think these are the sorts of restrictions we want to be able to place on a macaroon:

  • Limit the projects for which a macaroon is valid for.
  • Limit the releases within that project for which a macaroon is valid for.
  • Limit the files (including a hash of the actual file) for which a macroon is valid for.

In the future, we may end up allowing Macaroons to modify other things, like Users? Etc. So I think it's worthwhile to give a little thought to what that would look like. For example, would we just have a users_list``` alongside the package_list``?

I like the idea of just using the version specifier language to capture the release a thing is valid for, instead of tying that to a separate object. Although I think we should support the full range of version specifiers to make the restrictions syntax more flexible.

One thing to keep in mind with Macaroons, is that one of their key ideas is that given a Macaroon, anyone who possesses it can add additional restrictions going forward before passing that macaroon on to someone else. This makes them very powerful, and can enable a workflow like (just as an example):

  1. User grants TravisCI a non-time limited Macaroon for a specific Project.
  2. TravisCI, in preparation for a release task, restricts that Macaroon to only be valid for the next 15 minutes before passing it into the environment.
  3. Twine takes said macaroon, and further restricts it to only be valid for the specific filename and hash that it's just about to upload.

This is just an example, but it shows off how each layer of the process can further restrict the token, until ultimately the thing that is sent across the wire is only valid for a short period of time, and even if it were stolen, is only usable to upload a specific file anyways.

I lean towards liking the idea of just having a single resource caveat, that has a structure that exposes what kind of resource we're operating on, so my preference would be something like:

resources: Project("foo") Project("bar>=1.0,<3") File("widget-1.0.tar.gz", hash="sha256:...")

That makes it a little more complex to parse, but it shouldn't be too bad using a parsing library. I think the API (since the caveat structure is effectively an API) is a bit nicer that way though.

That still leaves the question of whether to have permissions be tied to a specific resource, or be a global list (or even both). I'm not sure here! If we're tying them to per resource, it'd just be a "kwarg" to the above structure. If we're making them global they'd just be their own list.

Actually, as I'm typing this it just occurred to me, there's no reason that the actual caveat language needs to be human friendly, it can be machine friendly and the tooling can provide the human friendly layer on top of it. So one possible option here is instead of defining a textual serialization, we can just define a structure, like (excuse the typing syntax):

import enum

from typing import List, Mapping, Union, Optional
from mypy_extensions import TypedDict

class ResourceKind(enum.Enum):
    Project = "Project"
    File = "File"

class Resource(TypedDict, total=True):
    kind: ResourceKind

# Due to limitation in MyPy, we have to put mandatory and
# optional keys in separate classes.
class _Project(Resource, total=True):
    name: str

# All of these keys are optional.
class Project(_Project, total=False):
    version_specifier: str

class _File(Resource, total=True):
    filename: str

class File(_File, total=False):
    hashes = TypedDict("Hashes", {"sha256": str}, total=False)

Resource = Union[Project, File]

ResourcesCaveat = List[Resource]

This would mean that basically the caveat would end up looking like

resources = [
    {"kind": "Project", "name": "foo"},
    {"kind": "Project", "name": "bar", "verson_specifier": ">=1.0,<3"},
    {"kind": "File", "filename": "widget-1.0.tar.gz", "hashes": {"sha256": "..."}},
]

caveat = f"resources: {serialize(resources)}"

The the question becomes what serialization language do we support. JSON is an easy one because it's built into Python, ubiquitous, and works the same on 2.x and 3.x. The biggest downside to JSON is that it's not really very compact. Another option I like is msgpack, which is a binary protocol (so we'd have to make sure that Macaroons doesn't choke if you put binary blobs into the caveat language) but it's much more efficient and compact than JSON.

One nice thing about msgpack is that it even supports the idea of extension types, so we could even ditch the kind field, and just register each of the "types" of resources as an extension type in MsgPack. That would make the wire protocol a bit more compact, but would make serializing/deserializing a bit more complex and would tie us to msgpack. I'm not sure if the tradeoff is worth it, but it might be, it would depend on how much more compact it made the wire protocol.

If we did go with a binary that would open up the question of if it would make sense to apply compression to the serialized binary, and if that would make the serialized caveat shorter or not (and if it did, shorter enough to make a difference). One of the main downsides with Macaroons is that all of the caveats are embedded into the macaroon itself, so each one you add, increases the length of the macaroon, so compression could help combat that.

Taking this idea even further... I wonder if it would make sense to just define all of our caveats in terms of a single structure like that, so that instead of having a resources caveat, and a not_before caveat, and a not_after caveat, etc etc, we instead have a single caveat without an identifier (so we don't need to try and do string splitting or whatever) that has a structure like (extending the structure from before):

class Expiration(TypedDict, total=True):
    not_before: datetime.datetime
    not_after: datetime.datetime

class Caveat(TypedDict, total=False):
    expiration: Expiration
    resources: ResourcesCaveat

Which would then ultimately give us a structure like:

caveat = {
    "expiration": {
        "not_before": datetime.datetime(...),
        "not_after": datetime.datetime(...),
    },
    "resources": [
         {"kind": "Project", "name": "foo"},
         {"kind": "Project", "name": "bar", "verson_specifier": ">=1.0,<3"},
         {"kind": "File", "filename": "widget-1.0.tar.gz", "hashes": {"sha256": "..."}},
    ],
}

caveat = serialize(caveat)

Then adding more caveats is just adding more keys to the Caveat type, it would obviously be important that we reject any caveat that has keys that we don't understand.

With that being said, I'm fine with deferring some of the caveat features into a later PR. Ultimately one could argue that we could even defer all of them into a later PR and let the first PR focus on getting the Macaroon framework in place (since without caveats, Macaroons are not worse than the status quo, and are a little bit better in that you can have multiple of them and revoke them independently, etc).

@steiza
Copy link
Author

steiza commented Nov 9, 2018

dstufft#2 was a very helpful reference!

I think it makes sense to hold off on implementing the caveat language as a separate diff.

@brainwane
Copy link
Contributor

Folks interested in this might be interested in #994 (comment) , #6084, and these meeting notes from today.

@brainwane
Copy link
Contributor

@steiza, given that we have now merged #6084 and deployed scoped API tokens for package upload on PyPI, I am going to close this PR. Thank you so much for the work you did, which @woodruffw and other teammates learned from!

I would love for you to consider taking what you've learned/done and working on adding caveats to macaroons for expiration (time) and version, and potentially implementing other caveats, as requested in #6255.

Thanks again.

@brainwane brainwane closed this Aug 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for API keys
3 participants