Pub/Sub API Redesign #3859

lukesneeringer · 2017-08-24T03:52:46Z

This PR represents an entirely redesigned Pub/Sub library, and is the culmination of #3463 (publisher), #3637 (subscriber and tests), and #3849 (documentation). Merging this PR will merge the rewritten library into master.

This is the final sanity check for a significant change (the library is rewritten from scratch). Once approved, we need to do a final version number bump and then a launch.

lukesneeringer · 2017-08-24T16:45:47Z

I really want to see a green build; #3860 merged and trying again.
If I continue to fail to get a green build for unrelated reasons by, say, noon today, I will just merge anyway.

lukesneeringer · 2017-08-24T17:20:10Z

THE BUILD IS GREEN!

Here we go!

kardamom · 2017-08-24T19:52:35Z

This might have broken AppEngine deployments: #3863

robhaswell · 2017-09-20T15:22:30Z

Thank you for this work to make the Python Pub/Sub client more performant. I would like to suggest that you update the main documentation with more information about how the asynchronous nature of the client is actually implemented. It's clear after reading the source that you are creating and managing your own background executor, but I think that information is something that users would like to understand.

I also think it would be useful for a user to have some information about how this could be combined with an asyncio-based program. I am currently researching this myself.

lukesneeringer · 2017-09-20T17:55:36Z

Hi @robhaswell!

Thank you for this work to make the Python Pub/Sub client more performant. I would like to suggest that you update the main documentation with more information about how the asynchronous nature of the client is actually implemented. It's clear after reading the source that you are creating and managing your own background executor, but I think that information is something that users would like to understand.

I am totally willing to consider this. :-)

I tried not to go too deeply into implementation details because I was concerned that it would be distracting to people who just need to know how to use it.

Can you explain to me what the use case you have is that you are trying to go figure out, so that I can understand my audience? I could totally see a case for a "Advanced: Bring your own concurrency" type of page that explains how to do that stuff that I spent so much time making sure was pluggable. :-)

Basically -- happy to oblige, but need some guidance as to what it is you need.

In the meantime, I sincerely hope the in-code documentation is being helpful, and I would appreciate feedback if it is not.

robhaswell · 2017-09-21T13:20:21Z

Thanks @lukesneeringer!

Can you explain to me what the use case you have is that you are trying to go figure out, so that I can understand my audience? I could totally see a case for a "Advanced: Bring your own concurrency" type of page that explains how to do that stuff that I spent so much time making sure was pluggable. :-)

Sure, the program I'm working on is rather simple. Its purpose is to read objects from our primary database (Postgres), and then write denormalised representations of them to our frontend database (Elasticsearch), which is a process we called indexing. Object UUIDs to be indexed come out of a Pub/Sub queue which is fed by the program which got these objects into Postgres in the first place.

Now, it takes quite a long time (comparatively) to build these Elastic representations but Postgres can sustain a very high read concurrency, so we pump an asyncio.Queue and use a configurable number of asynchronous workers to process the objects. This keeps performance up and memory usage down. We do this using aiopg, aioes and an unreleasable partial Pub/Sub client based on aiohttp. The program resembles this:

import asyncio
from .private.indexer import process_to_elastic
from .private.pubsub import Subscription

CONCURRENCY = 8

queue = asyncio.Queue(maxsize=10)


async def work_producer():
    while True:
        messages = Subscription.pull_many()
        for message in messages:
            await queue.put(message.data)


async def index_worker():
    while True:
        work = await queue.get()
        await process_to_elastic(work)


async def begin():
    await asyncio.gather(*((work_producer(),) + (index_worker() for _ in range(CONCURRENCY))))


if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(begin())
    loop.close()

As a side note, I think the implementation currently - "we provide a producer that you attach a callback to" - would be suboptimal for us. Our indexing process can lock up for a variety of reasons so we need to be able to apply backpressure to producing messages from the Pub/Sub queue, otherwise memory could grow unbounded and we risk flooding some poor database with millions of requests in the same instant.

I hope this helps! Oh and the in-code documentation has served me fine so far, thanks.

lukesneeringer · 2017-09-21T13:58:08Z

It does help. A few comments and thoughts:

Flow Control

Our indexing process can lock up for a variety of reasons so we need to be able to apply backpressure to producing messages from the Pub/Sub queue

We sort of have what you want here. Subscriptions have something called "flow control", which is basically a maximum number of unacked messages that they will work on. You could set the max number of messages pretty low and the client library will stop reading messages from Pub/Sub until they are dealt with. (It will not error, though.) I think it would be mostly equivalent to what you have here to simply set the max messages to 10.

asyncio

so we pump an asyncio.Queue and use a configurable number of asynchronous workers to process the objects. This keeps performance up and memory usage down. We do this using aiopg, aioes and an unreleasable partial Pub/Sub client based on aiohttp.

something something something envy that you get to write Python 3 only code. 😠 😛

I think this should be possible with a subclass of subscription.Policy that sends said asyncio.Queue object, but it would require some experimentation.

robhaswell · 2017-09-21T14:31:02Z

Thanks for getting back to me.

Flow control

Yes, maximum unacked messages has the same effect as our implementation. Sorry, I didn't spot that in the documentation, perhaps it is supposed to be detailed at http://google-cloud-python.readthedocs.io/en/latest/pubsub/subscriber/index.html which is blank for me (I should probably file that... somewhere).

asyncio

Yeah it's pretty nice 😄 I mean I think GCP would benefit from an aio-google-cloud-python project, but I'm not your programme manager! All the cool kids are getting aio-prefixed client libraries these days.

robhaswell · 2017-09-21T15:41:01Z

Hi @lukesneeringer, I made an example of how one would mix this client with an asyncio event loop: https://github.com/cloudfind/google-pubsub-asyncio

Hope this helps.

Luke Sneeringer and others added 5 commits June 1, 2017 10:09

Add GAPIC/proto in the base branch to remove them from comparison.

98c1c6c

Do GAPIC changeout on base branch.

0771416

Merge branch 'public-master' into pubsub

1d05d81

Merge branch 'public-master' into pubsub

570f48d

PubSub Publisher (#3463)

6bf3359

lukesneeringer added the api: pubsub Issues related to the Pub/Sub API. label Aug 24, 2017

lukesneeringer self-assigned this Aug 24, 2017

lukesneeringer requested a review from theacodes August 24, 2017 03:52

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Aug 24, 2017

theacodes approved these changes Aug 24, 2017

View reviewed changes

lukesneeringer mentioned this pull request Aug 24, 2017

Make system tests not care about case. #3860

Merged

Merge branch 'public-master' into pubsub

8da9bff

lukesneeringer merged commit 4a8e155 into master Aug 24, 2017

lukesneeringer deleted the pubsub branch August 24, 2017 17:20

tseaver mentioned this pull request Sep 22, 2017

Missing system tests after pubsub redesign #4031

Closed

This was referenced Nov 22, 2017

Turbinia not compatible with google-cloud-pubsub==0.29.0 google/turbinia#96

Closed

Support for google-cloud-pubsub==0.29.0 GoogleCloudPlatform/psq#32

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pub/Sub API Redesign #3859

Pub/Sub API Redesign #3859

lukesneeringer commented Aug 24, 2017

lukesneeringer commented Aug 24, 2017

lukesneeringer commented Aug 24, 2017 •

edited

Loading

kardamom commented Aug 24, 2017 •

edited

Loading

robhaswell commented Sep 20, 2017

lukesneeringer commented Sep 20, 2017

robhaswell commented Sep 21, 2017

lukesneeringer commented Sep 21, 2017

robhaswell commented Sep 21, 2017

robhaswell commented Sep 21, 2017

Pub/Sub API Redesign #3859

Pub/Sub API Redesign #3859

Conversation

lukesneeringer commented Aug 24, 2017

lukesneeringer commented Aug 24, 2017

lukesneeringer commented Aug 24, 2017 • edited Loading

kardamom commented Aug 24, 2017 • edited Loading

robhaswell commented Sep 20, 2017

lukesneeringer commented Sep 20, 2017

robhaswell commented Sep 21, 2017

lukesneeringer commented Sep 21, 2017

Flow Control

asyncio

robhaswell commented Sep 21, 2017

robhaswell commented Sep 21, 2017

lukesneeringer commented Aug 24, 2017 •

edited

Loading

kardamom commented Aug 24, 2017 •

edited

Loading