-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pagination Deployment to Service Handlers and CLI Commands and Domain Collections #237
Comments
When working on pagination in Prism we discovered one UI/UX issue.
With cursor pagination So there has to be a "redesign" of the pagination UX at least expected from the library (maybe after a review of CQRS as well), there is still too much logic being done on the client side. More logic needs to be incorporated into the Also possible issues with dealing with the last page/end of the paginated stream should be considered as well. |
This should be reviewed with respect to push-pull dataflow and control flow. Push vs PullPush vs pull is 2 paradigms of "reactivity" (https://en.wikipedia.org/wiki/Reactive_programming). These concepts are applicable widely in many scenarios.
You want both depending on the circumstance, and many systems are both push and pull, just in different ways. Push and pull systems when composed together can form a graph. This graph does not need to be acyclic, cycles in the graph can occur. Reactive systems are ultimately something that can be cyclic. But cyclic does not imply unproductive infinite evaluation. Productivity can still occur with infinite evaluation. Complete evaluation of the graph is not possible. Fundamentally systems are lazy and eventually consistent. Consider 2 agents communicating to each other. Each agent is a state machine. Each transition of the state may trigger transition in state on the other agent. The relationship is not one way, but 2 ways. Even in configuration systems, real state forms feedback into desired state. Thus an iterative system occurs as long as the system is "unstable". Stablity may never be reached... it's possible divergence can occur. Managing divergence is an exercise in complexity. Think about machine learning systems: convergence and divergence. Stability may be a "process", not an end state, just like security. Perturbations occur in complex systems simply due to change and entropy. The Origin of Change is an important concept. In a push interaction, the origin of change starts at the system pushing. In a pull interaction, the origin of change still occurs the system being pulled. It's the change being applied in a configuration management system. The initiator of the transaction is also important. This dictates which system has knowledge about the other system. This is independent of the origin of change (which indicates the direction of dataflow). The initiator of the transaction implies a "dependency" relationship in terms of integration direction. The direction of dataflow may be opposite to the direction of dependency (data flow vs control flow).
In push based systems, the dataflow is the direction of the pusher to the pushed. In pull based systems, the dataflow is the direction of the pulled to the puller. In push based systems, the control flow is the direction of the pusher to the pushed. The pusher is aware of the pushed. In pull based systems, the control flow is the direction of the puller to the pulled. The puller is aware of the pulled. Primitives in JS in push vs pull: https://stackoverflow.com/questions/39439653/events-vs-streams-vs-observables-vs-async-iterators https://github.com/kriskowal/gtor/blob/master/presentation/README.md So all of computing are reactive systems. |
This is being pulled from #327 to here:
|
With the transition to JSON RPC, this is still valid. We will still be returning collection data as a stream of individual JSON messages. However we will need to take input parameters to act as a cursor to control where the stream starts. |
For the input JSON request, we can reserve a Of course things like |
When we move from the GRPC to the JSON RPC, we want to have the These will translate directly into the server streaming handlers, which themselves will just hand it over any async generator method. |
On the caller side, these parameters should be passable from the CLI parameters. So for example: pk vaults list --seek <vaultId> --limit 10 --order asc So techncially our CLI doesn't really do much here. It doesn't become that useful until you get a the GUI ready. Normally... pk vaults list Will just stream the entire collection fully. For the calling side, if it calls a stream streaming method, it should use the output formater at each iteration, it shouldn't be accumulating all the data then outputting it. This is what will enable the CLI to also be streamable. |
@tegefaulkes I remember we discussed this especially in reference to changes you're doing for deadlines, did you add in the pagination capability to the stream handlers on the server side? And all that needs to be done is to propagate |
I don't think I've made changes for this yet. |
Neither the server nor client side? |
I recall looking it over and seeing the generators implementing the seeking behaviour. Right now I don't recall if its standard across the board. I don't think all the bin commands where seeking applies have all of the seeking options right now. |
All bin commands has seeking? But do all client handlers have relevant seeking parameters? |
I don't recall at this time. Its something I'll have to check. |
Check all CLI commands for Check all JSON RPC handlers for Apply them to the generator codes. |
|
@addievo also review this too. |
In the context of audit domain - this will be important. You start with a DB transactional snapshot iterator. That becomes a AsyncIterable through AsyncGenerator syntax, and then at the client service it comes a server streaming call. #599 - dashboard backend may use js-rpc and js-ws to make a server streaming call to the seed cluster agents. When doing so, it will need provide some parameters to control the result. Normally the result is finite. You can control the finiteness using pagination parameters as expressed above. @amydevs - there are no client service handlers that currently behave as per the OP spec. We can start with the If you do a while loop where you are continuously calling the server streaming call to get the new results, while preserving a cursor, that is equivalent to having an infinite iterator, that is one way to do get live updates for #599. This is still a pull-based architecture. Alternatively there could be a server streaming call, that would always be alive. And it is now on the handler side's responsibility to push data into the call. Then the client is pulling forever. It would then only close if the client decides to close stream. Dashboard service singleton which could do both. If we do both, there should be a standard of distinguishing between these kinds of server stream calls.
Another way is to provide a parameter that distinguishes the 2.
Imagine:
Therefore we could do something like:
Also by default - I like to prefer |
Also if you want to limit by a seek, you could add one more parameter that It is mutually exclusive to |
We can start this issue in the audit domain first - but closing this issue will require full adoption in the client service and agent service. |
I'm moving this to |
@tegefaulkes I'm hoping that we can start this now with all the new Unix commands. @aryanjassal should be reviewing this issue. I want to make sure that our pagination system is being put together properly. |
For something like However things like |
Specification
Pagination is the process by which a client program can acquire a subset of data from a larger stream of data.
Polykey maintains potentially large streams of data:
Atm, all of this data is either returned in 1 unary call which returns an in-memory deserialised array of data, or it is returned with a stream. This creates a problem when the amount of data is large, or when you want to go to specific point in the stream and not have to stream from the beginning again.
The standard for doing this is "pagination". Pagination uses a "cursor" to index into a larger dataset. The 2 main ways of pagination are:
Of the 2, cursor pagination is the more "simpler" and flexible form and fits into our usecase quite well.
In addition to this, one can combine cursors with streaming to return a stream of results based on the cursor. The only difference at this point is holding things in memory when you are streaming versus accumulating the results in memory and returning the result.
In the case of returning a static result in-memory, you free-up locks but you use up more memory. In the case of returning a stream, you may use less memory but it ends up being more complicated and more locking takes place. Due to our usage of leveldb and its streams making use of leveldb snapshots this may hide some of the complexity.
As a first stage prototype, let's add in pagination to all unary calls, and return static result arrays to be used by the CLI/GUI. Later we can explore using streaming.
We've built pagination protocol before here: https://github.com/MatrixAI/js-pagination that library is intended to be used on the client, but it describes what you might expect the server to take. It would mean GRPC methods will need:
direction
seek
limit
seekBefore
seekAfter
The last 2 may not be so necessary as they introduce more flexibility.
We've done this before on the Prism project, so there are some fiddly things here to note that require further discussion.
Another point is that streaming may be more useful for "reactivity". Or observational APIs. PK isn't configured to push events out anywhere. If we intend to do some CQRS to the GUI to maintain eventual consistency, we may need to figure out if we designate streams as "live" events, so that downstream UIs can react to changes in state. See: https://stackoverflow.com/questions/39439653/events-vs-streams-vs-observables-vs-async-iterators/47214496
We want to apply the following parameters to any generator method. This should be the standard we use to allow pagination on steams. This can be applied to any GRPC streams as well.
Service handlers
GRPC service handlers for RPC calls that provide streaming need to support this pagination. The pagination parameters need to be supplied either as part of the requesting message or the call metadata. These parameters are provided to the generator we are consuming for the stream.
This will need to be applied to every RPC call that returns a stream.
CLI Commands
Some CLI commands output a list as a result. We need to apply pagination here as well. The CLI command will need to have parameters for
seek
,order
andlimit
as specified above. Using these parameters it should be simple to make the GRPC call with them.Domain Collections
In a few domains we provide a getXs that will provide a generator for a collection. And example of this is the
sigchain.getClaims
orgestaltGraph.getGestalts
. Theses will need to take the pagination parameters as specified. THis will be the basis for the other two sections. Reference thesigchain
's implementation for how this is done.Additional context
Tasks
The text was updated successfully, but these errors were encountered: