Support gRPC + streaming for fetching records directly from Pinot servers in Pinot dbapi client #67

nizarhejazi · 2023-02-28T15:59:33Z

We have a use case that require fetching 100K-1M filtered records directly from Pinot servers with minimal performance hit. Each record has between 5 and 10 columns. We noticed fetching 500K records through default path (Pinot servers -> Pinot broker -> client) is a challenge for brokers.

Once reason is because Pinot dbapi client uses HTTP/JSON communication which is inefficient for large result set. Pinot-Connector for Presto and Spark fetches large result set directly from Pinot servers using a more efficient communication method: gRPC + streaming. This method has less impact on Pinot servers and allow fetching larger result set quickly.

Can you add gRPC + streaming support to Pinot python client?

[More details]
We noticed high CPU utilization on Pinot brokers. The following chart shows that Pinot brokers are spending most time on Reduce operation. Please note that the queries in question are simple SELECT + WHERE clause queries (no aggregations, no group by and no joins).

Reduce operation: Time spent by broker in combining query results from multiple servers.

Broker Avg. P99 reduce operation:

To summarize above chart, broker spends:

between 1s and up to 3.5s combining response for ApplicationStage queries.
between 1s and up to 4.5s combining response for ApplicationMilestone queries.
up to 1s combining response for ATSApplicant queries.

💡 The chart explains where 1s and up to 3s-4s of ApplicationStage and ApplicationMilestone queries are spent (broker combining responses, serializing into JSON before responding back to Reports Pinot client).

diogobaeder · 2023-04-03T11:52:19Z

One thing I'm planning to work on is to support different JSON libraries, like ujson and orjson (the latter being the fastest available for Python). This should allow much faster desserialization, but should still be easy and quick enough to implement. Would this be OK as a short-term improvement towards what you need?

nizarhejazi · 2023-04-03T12:43:05Z

yes @diogobaeder, definitely a step in the right direction.

diogobaeder · 2023-04-04T01:34:12Z

Cool, I'll try to implement that ASAP. Cheers!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support gRPC + streaming for fetching records directly from Pinot servers in Pinot dbapi client #67

Support gRPC + streaming for fetching records directly from Pinot servers in Pinot dbapi client #67

nizarhejazi commented Feb 28, 2023

diogobaeder commented Apr 3, 2023

nizarhejazi commented Apr 3, 2023

diogobaeder commented Apr 4, 2023

Support gRPC + streaming for fetching records directly from Pinot servers in Pinot dbapi client #67

Support gRPC + streaming for fetching records directly from Pinot servers in Pinot dbapi client #67

Comments

nizarhejazi commented Feb 28, 2023

diogobaeder commented Apr 3, 2023

nizarhejazi commented Apr 3, 2023

diogobaeder commented Apr 4, 2023