Skip to content

Latest commit

 

History

History
493 lines (355 loc) · 26.4 KB

service-invocation-protocol.md

File metadata and controls

493 lines (355 loc) · 26.4 KB

Restate Service Invocation Protocol

The following specification describes the protocol used by Restate to invoke remote Restate services.

Architecture

The system is composed of two actors:

  • Restate Runtime
  • Service deployment, which is split into:
    • SDK, which contains the implementation of the Restate Protocol
    • User business logic, which interacts with the SDK to access Restate system calls (or syscalls)

Each invocation is modeled by the protocol as a state machine, where state transitions can be caused either by user code or by Runtime events.

Every state transition is logged in the Invocation journal, used to implement Restate's durable execution model. The journal is also used to suspend an invocation and resume it at a later point in time. The Invocation journal is tracked both by Restate's runtime and the service deployment.

Runtime and service deployment exchange Messages containing the invocation journal and runtime events through an HTTP message stream.

State machine and journal

Every invocation state machine begins when the stream is opened and ends when the stream is closed. In the middle, arbitrary interaction can be performed from the Service deployment to the Runtime and vice versa via well-defined messages.

The state machine is summarized in the following diagram:

sequenceDiagram
    Note over Runtime,SDK: Start
    Runtime->>SDK: HTTP Request to /invoke/{service}/{handler}
    Runtime->>SDK: StartMessage
    Note over Runtime,SDK: Replaying
    Runtime->>SDK: [...]EntryMessage(s)
    Note over Runtime,SDK: Processing
    SDK->>Runtime: HTTP Response headers
    loop
        SDK->>Runtime: [...]EntryMessage
        Runtime->>SDK: CompletionMessage and/or EntryAckMessage
    end
    Note over SDK: Reached close condition
    alt
        SDK->>Runtime: SuspensionMessage
    else
        SDK->>Runtime: ErrorMessage
    else
        SDK->>Runtime: EndMessage
    end
    SDK->>Runtime: Close HTTP Response
    Note over Runtime,SDK: Closed
Loading

Replaying and Processing

Both runtime and SDKs transition the message stream through 2 states:

  • Replaying, that is when there are journal entries to replay before continuing the execution. Described in Suspension behavior.
  • Processing, that is after the replaying state is over.

There are a couple of properties that we enforce through the design of the protocol:

  • Runtime and service deployment both have their view of the journal
  • The source of truth of the journal and its ordering is:
    • The runtime, when the invocation is not in processing state
    • The service deployment, when the invocation is in processing state
  • When in replaying state, the service deployment cannot create new journal entries.
  • When in processing state, only the service deployment can create new journal entries, picking their order. Consequently, it might have newer entries that the runtime is not aware of. It’s also the responsibility of the service deployment to make sure the runtime has the same ordered view of the journal it has.
  • Only in processing state the runtime can send CompletionMessage

Syscalls

Most Restate features, such as interaction with other services, accessing service instance state, and so on, are defined as Restate syscalls and exposed through the service protocol. The user interacts with these syscalls using the SDK APIs, which generate Journal Entry messages that will be handled by the invocation state machine.

Depending on the specific syscall, the Restate runtime generates as response either:

  • A completion, that is the response to the syscall
  • An ack, that is a confirmation the syscall has been persisted and will be executed
  • Nothing

Each syscall defines a priori whether it replies with an ack or a completion, or doesn't reply at all.

Messages

The protocol is composed by messages that are sent back and forth between runtime and the service deployment. The protocol mandates the following messages:

  • StartMessage
  • [..]EntryMessage
  • CompletionMessage
  • SuspensionMessage
  • EntryAckMessage
  • EndMessage

Message stream

In order to execute an invocation, service deployment and restate Runtime open a single stream between the runtime and the service deployment. Given 10 concurrent invocations to a service deployment, there are 10 concurrent streams, each of them mapping to a specific invocation.

Every unit of the stream contains a Message serialized using the Protobuf encoding, using the definitions in protocol.proto, prepended by a message header.

This stream is implemented using HTTP, and depending on the deployment environment and the HTTP version it can operate in two modes:

  • Full duplex (bidirectional) stream: Messages are sent back and forth on the same stream at the same time. This option is supported only when using HTTP/2.
  • Request/Response stream: Messages are sent from runtime to service deployment, and later from service deployment to runtime. Once the service deployment starts sending messages to the runtime, the runtime cannot send messages anymore back to the service deployment.

A message stream MUST start with StartMessage and MUST end with either:

If the message stream does not end with any of these two messages, it will be considered equivalent to sending an ErrorMessage with an unknown failure.

The EndMessage marks the end of the invocation lifecycle, that is the end of the journal.

Initiating the stream

As described above, the runtime opens an HTTP request to the SDK to initiate the message stream.

Method

The request method used is always POST.

Path

The request path has the following format:

/invoke/{serviceName}/{handlerName}

For example:

/invoke/counter.Counter/Add

An arbitrary path MAY prepend the aforementioned path format.

In case the path format is not respected, or serviceName or handlerName is unknown, the SDK MUST close the stream replying back with a 404 status code.

Content type and protocol version

The request contains the content-type application/vnd.restate.invocation.vX where X is the service protocol version chosen by the runtime, e.g.:

content-type: application/vnd.restate.invocation.v1

The service protocol version is defined by ServiceProtocolVersion in protocol.proto.

The SDK MUST return back the same content-type in the successful response case. If the SDK doesn't support the content-type, It SHOULD close the stream replying back with a 415 status code.

Stream ready

To notify that the stream is ready to be used, the SDK MUST reply with 200 status code.

SDK version

The SDK MAY send back the response header x-restate-server:

x-restate-server: <sdk-name> / <sdk-version>

E.g.:

x-restate-server: restate-sdk-java/0.8.0

This header is used for observability purposes by the Restate observability tools.

Message header

Each message is sent together with a message header prepending the serialized message bytes.

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              Type             |            Reserved           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             Length                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The message header is a fixed 64-bit number containing:

  • (MSB) Message type: 16 bit. The type of the message. Used to deserialize the message. The first 6 bits are used as the message namespace, to categorize the different message types.
  • Message reserved bits: 16 bit. These bits can be used to send flags and other information, and are defined per message type/namespace.
  • Message length: 32 bit. Length of serialized message bytes, excluding header length.

StartMessage

The StartMessage carries the metadata required to bootstrap the invocation state machine, including:

  • known_entries: The known journal length
  • state_map: The eager state map (see Eager state)

Header

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             0x0000            |            Reserved           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             Length                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Flags:

  • 16 bits: Reserved

Entries and Completions

For each journal entry the runtime commits the entry message and executes the corresponding action atomically. The runtime won't commit the entry, nor perform the action, if the entry is invalid. If an entry is not committed, all the subsequent entries are not committed as well.

Entries can be:

  • Completable or not: These represent actions the runtime will perform, and for which consequently provide a completion value. All these entries have a result field defined in the message descriptor, defining the different variants of the completion value, and have a COMPLETED flag in the header.
  • Fallible or not: These can be rejected by the runtime when trying to commit them. The failure is not recorded in the journal, thus the runtime will abort the stream after receiving an invalid entry from the SDK.

The type of the journal entry is intrinsic in the definition of the journal action itself.

The header format for journal entries applies both when the runtime is sending entries to the SDK during a replay, and when the SDK sends entries to the runtime during processing.

Headers

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              Type             |A|          Reserved         |C|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             Length                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Flags:

  • 1 bit (MSB) A: REQUIRES_ACK flag. Mask: 0x0000_8000_0000_0000
  • 14 bits: Reserved
  • 1 bit C: COMPLETED flag (only Completable journal entries). Mask: 0x0000_0001_0000_0000

Completable journal entries and CompletionMessage

A completable journal entry at any point in time is either completed or not. After a completable journal entry is completed, it cannot change its state back to not completed.

There are three situations where a completable journal entry can be completed:

  • At creation time: when the SDK creates a completable journal entry, it can fill its result field and set the COMPLETED flag before sending the entry to the runtime. When replaying, the same result will be used.
  • At suspension time: when the invocation is suspended, meaning there is no in-flight message stream, the runtime might internally complete a journal entry filling its result field.
  • During the invocation processing: when the message stream is active and in Full duplex mode, the runtime can notify a completion by sending a CompletionMessage.

A CompletionMessage holds the result of the JournalEntry and its entry_index. A CompletionMessage can hold all the possible variants of a result field, and the SDK MUST be able to correlate the result field of the entry with the result field of CompletionMessage through the entry_index. After the completion is notified, the SDK MUST NOT send any additional messages related to this specific entry. On subsequent replays, the runtime automatically fills the result field of this entry, without sending a subsequent CompletionMessage.

The runtime can send CompletionMessage in a different order than the one used to store journal entries. The SDK might also not be interested in the result of completable journal entries, or it might be interested in the results in a different order used to create the related journal entries. Usually it's the service business logic that dictates in which results the SDK is interested, and in which order.

CompletionMessage Header

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             0x0001            |            Reserved           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             Length                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Acknowledgment of stored entries

If the SDK needs an acknowledgment that a journal entry, of any type, has been persisted, it can set the REQUIRES_ACK flag in the header. When set, as soon as the entry is persisted, the runtime will send back a EntryAckMessage with the index of the corresponding entry.

EntryAckMessage Header

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             0x0004            |            Reserved           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             Length                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Entry names

Every Journal entry has a field string name = 12, which can be set by the SDK when recording the entry. This field is used for observability purposes by Restate observability tools.

Journal entries reference

The following tables describe the currently available journal entries. For more details, check the protobuf message descriptions in protocol.proto.

Message Type Completable Fallible Description
InputEntryMessage 0x0400 No No Carries the invocation input message(s) of the invocation.
GetStateEntryMessage 0x0800 Yes No Get the value of a service instance state key.
GetStateKeysEntryMessage 0x0804 Yes No Get all the known state keys for this service instance. Note: the completion value for this message is a protobuf of type GetStateKeysEntryMessage.StateKeys.
SleepEntryMessage 0x0C00 Yes No Initiate a timer that completes after the given time.
CallEntryMessage 0x0C01 Yes Yes Invoke another Restate service.
AwakeableEntryMessage 0x0C03 Yes No Arbitrary result container which can be completed from another service, given a specific id. See Awakeable identifier for more details.
OneWayCallEntryMessage 0x0C02 No Yes Invoke another Restate service at the given time, without waiting for the response.
CompleteAwakeableEntryMessage 0x0C04 No Yes Complete an Awakeable, given its id. See Awakeable identifier for more details.
OutputEntryMessage 0x0401 No No Carries the invocation output message(s) or terminal failure of the invocation.
SetStateEntryMessage 0x0800 No No Set the value of a service instance state key.
ClearStateEntryMessage 0x0801 No No Clear the value of a service instance state key.
ClearAllStateEntryMessage 0x0802 No No Clear all the values of the service instance state.
RunEntryMessage 0x0C05 No No Run non-deterministic user provided code and persist the result.
GetPromiseEntryMessage 0x0808 Yes No Get or wait the value of the given promise. If the value is not present yet, this entry will block waiting for the value.
PeekPromiseEntryMessage 0x0809 Yes No Get the value of the given promise. If the value is not present, this entry completes immediately with empty completion.
CompletePromiseEntryMessage 0x080A Yes No Complete the given promise. If the promise was completed already, this entry completes with a failure.
CancelInvocationEntryMessage 0x0C06 No Yes Cancel the target invocation id or the target journal entry.
GetCallInvocationIdEntryMessage 0x0C07 Yes Yes Get the invocation id of a previously created call/one way call.
AttachInvocationEntryMessage 0x0C08 Yes Yes Attach to an existing invocation. If the invocation is still in-flight, this entry will be completed when the target invocation completes.
GetInvocationOutputEntryMessage 0x0C09 Yes Yes Get output of an existing invocation. If the invocation is still in-flight, this entry will be completed with empty value.

Awakeable identifier

When creating an AwakeableEntryMessage, the SDK MUST expose to the user code an id, required to later complete the entry, using either CompleteAwakeableEntryMessage or some other mechanism provided by the runtime.

The id format is a string starts with prom_1 concatenated with a Base64 URL Safe string encoding of a byte array that concatenates:

  • StartMessage.id
  • The index of the Awakeable entry, encoded as unsigned 32 bit integer big endian.

An example of a valid identifier would look like prom_1NMyOAvDK2CcBjUH4Rmb7eGBp0DNNDnmsAAAAAQ

Suspension

As mentioned in Replaying and processing, an invocation can be suspended while waiting for some journal entries to complete. When suspended, no message stream is in-flight for the given invocation.

To suspend an invocation, the SDK MUST send a SuspensionMessage containing entry indexes of the journal entry results required to continue the computation. This set MUST contain only indexes of completable journal entries that are not completed and that have been sent to the runtime. After sending the SuspensionMessage, the stream MUST be closed.

The runtime will resume the invocation as soon as at least one of the given indexes is completed.

Failures

There are a number of failures that can incur during a service invocation, including:

  • Transient network failures that interrupt the message stream
  • SDK bugs
  • Protocol violations
  • Business logic bugs
  • User thrown retryable errors

To notify a failure, the SDK can either:

  • Close the stream with ErrorMessage as last message. This message is used by the runtime for accurate reporting to the user.
  • Close the stream without EndMessage or SuspensionMessage or ErrorMessage. This is equivalent to sending an ErrorMessage with unknown reason.

The runtime takes care of retrying to execute the invocation after such failures occur, following a defined set of policies. When retrying, the previous stored journal will be reused. Moreover, the SDK MUST NOT assume that every journal entry previously sent on the same message stream has been correctly stored.

The SDK can allow users to end/terminate invocations with an exceptional return value. This is done in a similar fashion to the successful return value case, by generating a OutputStreamEntry with the failure variant set, sending it and closing the stream afterward.

ErrorMessage Header

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             0x0003            |            Reserved           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             Length                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Endpoint discovery

Restate expects SDKs to provide reflective information about the exposed services and the supported protocol versions at /discovery. These reflective information are propagated through an endpoint manifest. This document MUST follow the schema defined in endpoint_manifest_schema.json and is identified by the content-type string application/vnd.restate.endpointmanifest.vX+json, where X is the manifest version.

When sending the discovery request, the Restate runtime might specify a set of supported endpoint manifest schemas in the Accept header, for example:

accept: application/vnd.restate.endpointmanifest.v2+json, application/vnd.restate.endpointmanifest.v1+json

When replying, the content-type MUST contain the chosen endpoint manifest type/version:

content-type: application/vnd.restate.endpointmanifest.v1+json

The service discovery protocol version is defined by ServiceDiscoveryProtocolVersion in discovery.proto.

Optional features

The following section describes optional features SDK developers MAY implement to improve the experience and provide additional features to the users.

Custom entry messages

The protocol allows the SDK to register an arbitrary entry type within the journal. The type MUST be >= 0xFC00. The runtime will treat this entry as any other entry, persisting it and sending it during replay in the correct order.

Custom entries MAY have the entry name field 12, as described in entry names.

The field numbers 13, 14 and 15 MUST not be used, as they're reserved for completable journal entries, as described in completable journal entries.

Header

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              Type             |A|           Reserved          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             Length                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  • Type MUST be >= 0xFC00

Flags:

Eager state

As described in Journal entries reference, to get a service instance state entry, the SDK creates a GetStateEntryMessage without a result, and waits for a Completion with the result, or alternatively suspends and expects the GetStateEntryMessage.result is filled when replaying.

SDKs MAY optimize the state access operations by reading the partial_state and state_map fields within the StartMessage. The state_map field contains key-value pairs of the current state of the service instance. When partial_state is set, the state_map is partial/incomplete, meaning there might be entries stored in the Runtime that are not part of state_map. When partial_state is unset, the state_map is complete, thus if an entry is not within the map, the SDK can assume it's not stored in the runtime either.

A possible implementation could be the following. Given a user requests a state entry with key my-key:

  • If my-key is available in state_map, generate a GetStateEntryMessage with filled result, and return the value to the user
  • If my-key is not available in state_map
    • If partial_state is unset, generate a GetStateEntryMessage with empty result, and return empty to the user
    • If partial_state is set, generate a GetStateEntryMessage without a result, and wait for the runtime to send a Completion back (same logic as without eager state)

In order for the aforementioned algorithm to work, set, clear and clear all state operations must be reflected on the local state_map as well.