Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Policy on the basic CRUD operations #1722

Closed
jgeewax opened this issue Apr 13, 2016 · 17 comments
Closed

Policy on the basic CRUD operations #1722

jgeewax opened this issue Apr 13, 2016 · 17 comments
Assignees
Labels
api: core priority: p2 Moderately-important priority. Fix may not be included in next release.

Comments

@jgeewax
Copy link
Contributor

jgeewax commented Apr 13, 2016

Seems that we're having some disagreements on what our surfaces should look like. Here's my opinion to start the discussion. I use a "thing API" that deals with "things". You can replace with "storage" and "bucket" if it that's easier.

/cc @daspecster @dhermes @tseaver @jonparrott

Client pattern

A client handles your credentials and acts as the pipe to the API.

from gcloud import thingifier
client = thingifier.Client()

Instantiate a thing

Create the thing object and eventually save it remotely.

thing = client.thing(parameters)
# ...
thing.save()

It's important that the there are no required parameters:

thing = client.thing()

If you want a one-liner to save a new thing remotely.

# Nuance here is that .save() returns the thing.
thing = client.thing(parameters).save()

# Note that we're explicitly NOT providing:
# thing = client.create_thing(parameters)

It's unusual, but perhaps you want to "load" a thing from the API by first creating the thing object with the right ID, and then reloading it.

thing = client.thing()
thing.id = 'identifier'
thing.reload()

Get a thing

Give me a thing from the API service.

thing = client.get_thing(identifier)

What if the thing doesn't exist?

thing = client.get_thing(identifier)
# thing is None

Update a thing

The same pattern as creating a thing, except I start with a loaded thing.

thing = client.get_thing(identifier)
thing.property = 'new value'
thing.save()

Reload a thing

I have a thing, but I suspect it's changed since I loaded it. I want to make sure what i have is fresh...

thing = client.get_thing(identifier)
# ... some time later ...
thing.reload()

List a bunch of things

I have many things stored and I want to go through all of them.

for thing in client.list_things():
  do_something_with(thing)

I have lots of things, but I only want 100 of them. I don't care how many API calls there are.

for thing in client.list_things(limit=100):
  do_something_with(thing)

I have lots of things, and I want to control specifically how many API calls I'm making -- I'm on a budget!

for thing in client.list_things(api_call_limit=5):
  do_something_with(thing)

Delete a thing

I have a thing that I loaded, or paged through, and I want to delete it.

thing = client.get_thing(identifier)
thing.delete()

And to do this in a one-liner, with a single API request:

client.thing(identifier).delete()

We may also want to support a single method, but I'm not 100% sure about this...

client.delete_thing(identifier)

Related

@dhermes
Copy link
Contributor

dhermes commented Apr 13, 2016

FWIW, we already discussed this in #911.

@jgeewax
Copy link
Contributor Author

jgeewax commented Apr 13, 2016

Added #911 to the related list. I think the discussion there got side-tracked (the main one was the debate about whether .create_thing() should exist, which I've specifically left out).

@dhermes
Copy link
Contributor

dhermes commented Apr 13, 2016

The main issue is really the disagreement about the Zen of Python:

There should be one-- and preferably only one --obvious way to do it.

In addition, we've never heard a request from a user for the extra methods.

FWIW, I don't really have much fight in me on this one. I'm happy to acquiesce and implement whatever.

@theacodes
Copy link
Contributor

Devil's advocate here:

Although practicality beats purity.

the main one was the debate about whether .create_thing() should exist

I'm fine with leaving create_thing out in favor of client.thing().save().

I'm personally in favor of @jgeewax's proposal for the API here.

@dhermes
Copy link
Contributor

dhermes commented Apr 13, 2016

Still happy to acquiesce and implement whatever.

@theacodes
Copy link
Contributor

Definitely want to hear @tseaver's view here.

@rimey
Copy link
Contributor

rimey commented Apr 13, 2016

From a user's perspective, this is the natural way to fetch a thing:

thing = client.get_thing(identifier)

Whether additionally offering a reload() method makes sense depends on the API. If the resource does not have attributes that can change, it probably doesn't make sense.

@daspecster
Copy link
Contributor

Is anyone Dutch?

I'm not sure if these methods break the zen. Batteries included almost makes it sound like convince methods are welcome if logical and clear.

I think the most important part is consistency so a user can find the rhythm of the library.
Also I think it might help to keep in mind that in these cases we are working from a client object and interacting outside of a local context.

I'm game like @dhermes for whatever as long as we're consistently consistent. :-D

@jgeewax
Copy link
Contributor Author

jgeewax commented Apr 14, 2016

@rimey : What if we standardized on client.thing().<action>() ?

This would mean:

Option A Option B
thing = client.thing('id').get() thing = client.get_thing('id')
thing = client.thing('id, name='Foo').save() (The same)
thing.name = 'Bar'; thing.save() (The same)
success = client.thing('id').delete() success = client.delete_thing('id')

@jgeewax
Copy link
Contributor Author

jgeewax commented Apr 14, 2016

OK. I updated the body of this issue to summarize things, but the note above suggests ways we could avoid having get_thing and delete_thing as unique actions (replaced by client.thing(id).get() and client.thing(id).delete()).

(re .get() vs .reload(): while they're identical, .reload() feels wrong as "the one way to get a thing". I'd be supportive of .get() or .load() ?)

Regardless of that, some key points are...

  • .create_thing() shouldn't exist
  • thing.<action>() should return the thing when useful (ie, .save()), or a boolean about whether the action is successful if not useful (ie, delete())
  • .thing() should never have required parameters

@theacodes
Copy link
Contributor

@jgeewax I'm okay with that, too. That's a pretty major change for the API surface though. Definitely want to hear @tseaver and @dhermes take on this.

@rimey
Copy link
Contributor

rimey commented Apr 14, 2016

The client.thing(id).get() proposal seems to have come out of the blue. It has many problems:

  • As a user, I would prefer to write client.get_thing(id).
  • Thing.get is naturally a class method. Seeing it be an instance method would surprise me. If the class method were ever desired in the future, the instance method would get in the way.
  • How would list() fit in? Badly, I think.
  • thing.reload() returns None to emphasize that it is side-effectful, like list.reverse(). This consideration typically outweighs enabling method chaining, except where method chaining is a key element of the interface (e.g., in building database queries).
  • client.thing(id).get() forces the client library to tolerate partly constructed objects. If certain attributes need to be defined before save() can be called, making them required parameters of the factory (and the constructor) would make this apparent in the method signature. Informing the user of the usage error through an exception raised by save() is not optimal.

I would be wary of trying too hard to define a one-size-fits-all CRUD pattern. Different APIs are different:

  • Some resources are about named objects offering little more than object identity (e.g., PubSub topics), sometimes with names assigned by the client, and sometimes with names assigned by the service. Some resources have associated second-class properties like ACLs or "metadata" (e.g., Cloud Storage blobs). Other resources have lots of essential attributes (e.g., metric descriptors). Some are nothing more than saved collections of attributes (e.g., Stackdriver groups, which are really just saved filters).
  • Some resources are mutable. Some can only be created and deleted (e.g., metric descriptors). Others are read-only (e.g., managed resource descriptors).

@jgeewax
Copy link
Contributor Author

jgeewax commented Apr 15, 2016

@rimey, just tossing out ideas. Not sure if they're any good.

I would be wary of trying too hard to define a one-size-fits-all CRUD pattern.

This is a guideline for the common case that we keep seeing. Cloud Storage, Pub/Sub, etc all have basic CRUD operations and static resource names. We do different things for these similar scenarios and I don't think we should.

This isn't meant to be commandments stating that all APIs must look exactly like this. If they can look this way, they should. If we have a good reason to deviate, we should.

client.thing(id).get() forces the client library to tolerate partly constructed objects.

I'm fine with dropping the client.thing(id).get() pattern generally in favor of client.get_thing(id), however no matter what we do I think we need to support "partially constructed objects".

The API might require something by the time it arrives, however our client needs to accommodate people who don't have all of the information at init-time. Otherwise, we're just encouraging people to fill in dummy values until they have all the information they need. Sanjay G is a big proponent of "tell users quickly when they supply bad data", and even in Protobuf we don't deal with required fields this way.

In other words, I don't think __init__() methods should use parameters without defaults as a way of conveying that certain fields are required.

Informing the user of the usage error through an exception raised by save() is not optimal.

If a field has to be a number and a user gives us "foo", we should tell them right away (protobuf does this). If they make a Thing object and so far haven't set the id field (which is required), any sort of exception at this point seems premature.

In other words, users say "I'm done!" by calling an API-contacting method (.get(), .save(), etc). At that point, it makes sense to say "hey... you didn't provide an ID... and to save this you have to pick one...".

As I mentioned above, this isn't a commandment. In Datastore, the ID field is "required" to be a persisted entity, but it's fine to omit the ID when saving a new one -- that's how you say "give me an auto-generated ID" -- so this certainly isn't a one-size-fits-all mandate.

@theacodes
Copy link
Contributor

Another point of refrence for this:

It's awkward in the bigquery API to have to do this:

    bigquery_client = bigquery.Client()
    dataset = bigquery_client.dataset(dataset_name)
    table = dataset.table(table_name)

    # Reload the table to get the schema.
    table.reload()

Just to make sure the table has schema populated so data can be added.

@tseaver
Copy link
Contributor

tseaver commented Aug 19, 2016

If the back-end already knows the schema (which it must, or else table.reload() wouldn't fetch it), then it should be willing to accept the uploaded data without having us resend it. If the back-end is so willing, then we need to change upload_from_file to skip populating configuration['schema'] if it isn't set. @jonparrott can you confirm that the upload would succeed even without having configuration['schema'] populated?

@theacodes
Copy link
Contributor

In this case I can't because of another bug.

On Thu, Aug 18, 2016, 6:17 PM Tres Seaver notifications@github.com wrote:

If the back-end already knows the schema (which it must, or else
table.reload() wouldn't fetch it), then it should be willing to accept
the uploaded data without having us resend it. If the back-end is so
willing, then we need to change upload_from_file to skip populating
configuration['schema'] if it isn't set. @jonparrott
https://github.com/jonparrott can you confirm that the upload would
succeed even without having configuration['schema'] populated?


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#1722 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPUc9vi6zJ_DLiTUZ4kcoHUbUEw-9SGks5qhQQlgaJpZM4IGbgp
.

@lukesneeringer lukesneeringer added the priority: p2 Moderately-important priority. Fix may not be included in next release. label Apr 19, 2017
@lukesneeringer
Copy link
Contributor

I am going to close this because:

  • It is mostly a meta-discussion which is now stale.
  • The ship has sailed at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: core priority: p2 Moderately-important priority. Fix may not be included in next release.
Projects
None yet
Development

No branches or pull requests

7 participants