New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Reapplying changes from tokens branch #4949

Closed

steiza wants to merge 3 commits into pypi:master from steiza:api_tokens

steiza commented Oct 27, 2018 •

edited by di

Loading

Fixes #994.

I accidentally hosed #4599 when attempting to pull upstream changes / fix merge commits, so I'm taking the changes there and re-applying them without the merge commit.

This was referenced Oct 27, 2018

Add support for API keys #4599

Closed

Add support for API keys #994

Closed

di requested review from di, dstufft and ewdurbin

October 27, 2018 16:14

dstufft requested changes

View reviewed changes

Member

dstufft left a comment

I went through and identified the issues that I currently see.

For what it's worth, I started working on a proof of concept for Macaroons a few months back, but I never got around to finishing it. I pushed that up to my fork and made a PR against master on my fork to make it easy to see how I structured that proof of concept.

You can find that at dstufft#2.

You don't have to use that proof of concept but you're free to use it if you wish. If I remember correctly, it was working just fine, I just never actually got around to implementing caveats (I got side tracked around that point). I included my idea for a few different ways that those first party caveats could be implemented though. I'm including it here just in case it might be helpful!

My PoC did not include the idea of limiting what routes are allowed to authenticate using a Macaroon, I think that is a pretty good way of limiting the impact that a Macaroon can possibly have, so I think either way we should definitely keep that.

warehouse/accounts/auth_policy.py Outdated

+                          if macaroon.identifier != request.registry.settings["account_token.id"]:
+                              return None
+                          if macaroon.location != "pypi.org":

Member

dstufft Oct 27, 2018

I'm not sure that this check is achieving much here, IIRC the location isn't part of the secure context of the Macaroon, so at best it's just making it so that you can't pass an invalid macaroon in... which isn't already possible since they won't be valid.

Author

steiza Oct 28, 2018

You're right! According to https://ai.google/research/pubs/pub41892 this field is optional and (I just learned) even left out of the integrity checks. I'll remove this.

I've been learning more about macaroons since I first heard about them at PyCon 2018...

warehouse/accounts/auth_policy.py Outdated

+                          # First, check identifier and location
+                          if macaroon.identifier != request.registry.settings["account_token.id"]:
+                              return None

Member

dstufft Oct 27, 2018

I don't think we should have a single Macaroon key for all of Warehouse. If for some reason that key ever gets compromised, that makes it really painful to need to roll that key (it will invalidate all of the current Macaroons). It also doesn't allow someone to delete or blacklist a macaroon once produced, making it so that the authority provided by a Macaroon can never be revoked.

A far better idea I think here is to make it so that we generate a key per Macaroon that we produce and store that in the database. Revoking a macaroon is then as simple as deleting a row from the database, users can have multiple macaroons given to different services and can easily revoke them. Warehouse itself can still mass revoke all Macaroons of course, but we should hopefully never require that.

Author

steiza Oct 28, 2018

We can definitely have a per-macaroon secret instead of one for all of Warehouse; that's a good best-practice.

It's a minor point, but in this implementation it is possible for a user to generate multiple macaroons and revoke a particular macaroon by deleting its row from the database. After we verify the macaroon, we use the accounts_token table to map the macaroon's id to a username, so if we remove that row the macaroon will no longer be usable.

Member

dstufft Oct 28, 2018

Yea, I noticed that later on, and then just forgot to update this review :)

warehouse/accounts/auth_policy.py Outdated

+                      if request.matched_route.name not in self._routes_allowed:
+                          return None
+                      account_token = request.params.get("account_token")

Member

dstufft Oct 27, 2018

We should not support authentication coming from a query string, this should be something that can't easily be accidentally leaked within an URL.

Author

steiza Oct 28, 2018

Excellent point!

warehouse/accounts/auth_policy.py Outdated

+                          # Check the macaroon against our configuration
+                          verifier = Verifier()
+                          verifier.third_party_caveat_verifier_delegate = IgnoreThirdPartyCaveats()

Member

dstufft Oct 27, 2018

Why are we ignoring third party caveats? If a user wants to attach a third party caveat to their Macaroon, we should be more than happy to verify it. If I remember correctly, a third party caveat only adds extra complexity on the client side, for the server side verifying them is pretty easy.

Author

steiza Oct 28, 2018

Ah, the name I chose here might be confusing. IgnoreThirdPartyCaveats just returns True for verifying any third-party caveat (since we don't know how those caveats are structured). This does allow macaroons to include third-party caveats, which we ignore (in the sense that they will not cause validation to fail).

warehouse/accounts/auth_policy.py Outdated

+                              # If caveats are specified multiple times, only trust the
+                              # first value we encounter.
+                              if caveat_key == "id" and account_token_id is None:

Member

dstufft Oct 27, 2018

This logic is wrong I think, we shouldn't only trust the first value we encounter, because a key property of Macaroons is that given a Macaroon m1, you can further restrict the scope of the authority granted by that Macaroon by applying additional caveats and generating a new macroon, m2.

This applies to all caveats, for the id specifically, since I think that specifically the user, providing multiple id should basically be an error, because you're creating a macaroon that has caveats that are mutually exclusive.

For package_list, if we have two caveats that add {Django, requests} and {Django, Pyramid}, then the only package we should allow this macaroon to operate on is {Django}.

Author

steiza Oct 28, 2018

Yeah, this was another instance of me not being that familiar with macaroons back at PyCon 2018. I'll update this.

Like you mention below, we might need to pull some of the view logic (like getting the package name) here so we can properly validate the macaroon.

warehouse/accounts/auth_policy.py Outdated

+                                  package_list = caveat_value
+                          if package_list is not None:
+                              request.session["account_token_package_list"] = package_list

Member

dstufft Oct 27, 2018

I'm not a huge fan of stuffing this information into the request session and then manually checking it in the view. It's bypassing the Pyramid authorization framework, and I feel like it will make it easier to either forget to check it (or worse, check only the Macaroon but not the Pyramid authorization framework).

I feel pretty strongly that our Macaroon support should work within the Authentication and the Authorization APIs that Pyramid provides. Unfortunately this isn't the cleanest thing in the world, since the Authorization framework doesn't have any direct access to the request, but you can work around that.

Author

steiza Oct 28, 2018

Yeah, makes sense. I'll try this approach (and hopefully it doesn't punch too many holes in the abstraction between the view and the authorization layer).

warehouse/accounts/auth_policy.py Outdated


		return login_service.find_userid_by_account_token(account_token_id)

		except (struct.error, MacaroonException):

Member

dstufft Oct 27, 2018

What can raise a struct.error here?

Author

steiza Oct 28, 2018

pymacaroons uses struct to do binary deserialization. If you pass a malformed macaroon it can generate a struct.error.

Member

dstufft Oct 28, 2018

Ahh, gotcha. That sucks they let that leak out. It'd be nice if there was a comment explaining that so we don't forget (and a test that triggers that behavior if there isn't one already).

warehouse/accounts/auth_policy.py Outdated

+                      self._authenticate = authenticate
+                      self.callback = self._auth_callback
+                      self._routes_allowed = ["forklift.legacy.file_upload"]

Member

dstufft Oct 28, 2018

This should probably be passed in as a value to the __init__ function here.

Author

steiza Oct 28, 2018

Will do!

warehouse/accounts/auth_policy.py Outdated

+                  def _validate_first_party_caveat(self, caveat):
+                      # Only support 'id' and 'package_list' caveat for now
+                      if caveat.split(": ")[0] not in ["id", "package_list"]:

Member

dstufft Oct 28, 2018

We probably want to structure our caveats differently here as well as add some additional ones.

Off the top of my head:

Caveats should be able to restrict the projects that a macaroon is good for.
- This could be further generalized to a caveat should be able to restrict the resources that a macaroon is good for. This could let you upload for specific files, releases, etc.
Caveats should be able to be limited in terms of what permissions they grant.
Caveats should he able to be limited in validity, ideally with both not before, and not after, and we should have a maximum window that a temporarily limited macaroon can be valid for, to reduce the chance of bugs.

A possibly interesting one, would be to combine the first two so that you could have a single macaroon that granted different permissions to different resources. Although there is a balance to be made in the simplicity of the caveat mini language and the complexity of what we enable people and tools to implement so it's possible that making it possible to have different permissions for different resources is too much complexity for too little gain.

Author

steiza Oct 28, 2018

Yeah, I think this is the most interesting part of the project! There are so many possibilities.

... but I'm also interested in what the minimum requirements are so we can establish this framework (while supporting the future use cases we might want to implement). And of course, whatever is implemented has to be somewhat easy to document and describe.

With that in mind, how about something like:

valid_before: in '%Y-%m-%d %H:%M:%S%z' format?
valid_after: same format
- If both are specified, and the window is more than 2-3 years (taking a cue from TLS certificates), complain? Of course, by default these tokens will not be limited.
package_list: this is where it gets complicated. I think the requirements are:
- We want to be able to specify multiple packages / versions / actions in one caveat, otherwise that caveat won't validate.
- We don't want version specification and action to be optional (defaulting to all)
- We want to be able to further restrict a macaroon with additional caveats later
- So maybe:
  - A package_list is made of a space-separated list of package_items
  - A package_item has a required package name (canonicalized?)
  - A package_item is optionally followed by == and a version string
  - A package_item is optionally followed by a | and package actions (like upload?)
- For this diff, we'll just implement the package name and skip the version strings and actions?

Member

dstufft Oct 28, 2018

The valid_(before|after) syntax seems reasonable. I'm not sure if we want to support all the way out to 2-3 years or not, but ideally that should be a constant defined somewhere we can tweak, so nailing down the exact answer right now isn't super important I don't think.

For package_list, I'm torn. Stepping back a minute, I think these are the sorts of restrictions we want to be able to place on a macaroon:

Limit the projects for which a macaroon is valid for.
Limit the releases within that project for which a macaroon is valid for.
Limit the files (including a hash of the actual file) for which a macroon is valid for.

In the future, we may end up allowing Macaroons to modify other things, like Users? Etc. So I think it's worthwhile to give a little thought to what that would look like. For example, would we just have a users_list``` alongside the package_list``?

I like the idea of just using the version specifier language to capture the release a thing is valid for, instead of tying that to a separate object. Although I think we should support the full range of version specifiers to make the restrictions syntax more flexible.

One thing to keep in mind with Macaroons, is that one of their key ideas is that given a Macaroon, anyone who possesses it can add additional restrictions going forward before passing that macaroon on to someone else. This makes them very powerful, and can enable a workflow like (just as an example):

User grants TravisCI a non-time limited Macaroon for a specific Project.
TravisCI, in preparation for a release task, restricts that Macaroon to only be valid for the next 15 minutes before passing it into the environment.
Twine takes said macaroon, and further restricts it to only be valid for the specific filename and hash that it's just about to upload.

This is just an example, but it shows off how each layer of the process can further restrict the token, until ultimately the thing that is sent across the wire is only valid for a short period of time, and even if it were stolen, is only usable to upload a specific file anyways.

I lean towards liking the idea of just having a single resource caveat, that has a structure that exposes what kind of resource we're operating on, so my preference would be something like:

resources: Project("foo") Project("bar>=1.0,<3") File("widget-1.0.tar.gz", hash="sha256:...")

That makes it a little more complex to parse, but it shouldn't be too bad using a parsing library. I think the API (since the caveat structure is effectively an API) is a bit nicer that way though.

That still leaves the question of whether to have permissions be tied to a specific resource, or be a global list (or even both). I'm not sure here! If we're tying them to per resource, it'd just be a "kwarg" to the above structure. If we're making them global they'd just be their own list.

Actually, as I'm typing this it just occurred to me, there's no reason that the actual caveat language needs to be human friendly, it can be machine friendly and the tooling can provide the human friendly layer on top of it. So one possible option here is instead of defining a textual serialization, we can just define a structure, like (excuse the typing syntax):

import enum

from typing import List, Mapping, Union, Optional
from mypy_extensions import TypedDict

class ResourceKind(enum.Enum):
    Project = "Project"
    File = "File"

class Resource(TypedDict, total=True):
    kind: ResourceKind

# Due to limitation in MyPy, we have to put mandatory and
# optional keys in separate classes.
class _Project(Resource, total=True):
    name: str

# All of these keys are optional.
class Project(_Project, total=False):
    version_specifier: str

class _File(Resource, total=True):
    filename: str

class File(_File, total=False):
    hashes = TypedDict("Hashes", {"sha256": str}, total=False)

Resource = Union[Project, File]

ResourcesCaveat = List[Resource]

This would mean that basically the caveat would end up looking like

resources = [
    {"kind": "Project", "name": "foo"},
    {"kind": "Project", "name": "bar", "verson_specifier": ">=1.0,<3"},
    {"kind": "File", "filename": "widget-1.0.tar.gz", "hashes": {"sha256": "..."}},
]

caveat = f"resources: {serialize(resources)}"

The the question becomes what serialization language do we support. JSON is an easy one because it's built into Python, ubiquitous, and works the same on 2.x and 3.x. The biggest downside to JSON is that it's not really very compact. Another option I like is msgpack, which is a binary protocol (so we'd have to make sure that Macaroons doesn't choke if you put binary blobs into the caveat language) but it's much more efficient and compact than JSON.

One nice thing about msgpack is that it even supports the idea of extension types, so we could even ditch the kind field, and just register each of the "types" of resources as an extension type in MsgPack. That would make the wire protocol a bit more compact, but would make serializing/deserializing a bit more complex and would tie us to msgpack. I'm not sure if the tradeoff is worth it, but it might be, it would depend on how much more compact it made the wire protocol.

If we did go with a binary that would open up the question of if it would make sense to apply compression to the serialized binary, and if that would make the serialized caveat shorter or not (and if it did, shorter enough to make a difference). One of the main downsides with Macaroons is that all of the caveats are embedded into the macaroon itself, so each one you add, increases the length of the macaroon, so compression could help combat that.

Taking this idea even further... I wonder if it would make sense to just define all of our caveats in terms of a single structure like that, so that instead of having a resources caveat, and a not_before caveat, and a not_after caveat, etc etc, we instead have a single caveat without an identifier (so we don't need to try and do string splitting or whatever) that has a structure like (extending the structure from before):

class Expiration(TypedDict, total=True):
    not_before: datetime.datetime
    not_after: datetime.datetime

class Caveat(TypedDict, total=False):
    expiration: Expiration
    resources: ResourcesCaveat

Which would then ultimately give us a structure like:

caveat = {
    "expiration": {
        "not_before": datetime.datetime(...),
        "not_after": datetime.datetime(...),
    },
    "resources": [
         {"kind": "Project", "name": "foo"},
         {"kind": "Project", "name": "bar", "verson_specifier": ">=1.0,<3"},
         {"kind": "File", "filename": "widget-1.0.tar.gz", "hashes": {"sha256": "..."}},
    ],
}

caveat = serialize(caveat)

Then adding more caveats is just adding more keys to the Caveat type, it would obviously be important that we reject any caveat that has keys that we don't understand.

With that being said, I'm fine with deferring some of the caveat features into a later PR. Ultimately one could argue that we could even defer all of them into a later PR and let the first PR focus on getting the Macaroon framework in place (since without caveats, Macaroons are not worse than the status quo, and are a little bit better in that you can have multiple of them and revoke them independently, etc).

steiza added 3 commits

November 9, 2018 17:01


          reapplying changes from tokens branch

e259c9e


          fix test, add alembic migration

7fe3604


          Feedback from code review

a115391

steiza force-pushed the api_tokens branch from fe8f05f to a115391 Compare

November 9, 2018 22:07

Author

steiza commented Nov 9, 2018

dstufft#2 was a very helpful reference!

I think it makes sense to hold off on implementing the caveat language as a separate diff.

Contributor

brainwane commented Jun 24, 2019

Folks interested in this might be interested in #994 (comment) , #6084, and these meeting notes from today.

Contributor

brainwane commented Aug 8, 2019

@steiza, given that we have now merged #6084 and deployed scoped API tokens for package upload on PyPI, I am going to close this PR. Thank you so much for the work you did, which @woodruffw and other teammates learned from!

I would love for you to consider taking what you've learned/done and working on adding caveats to macaroons for expiration (time) and version, and potentially implementing other caveats, as requested in #6255.

Thanks again.

brainwane closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet