Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/cloud/connecting-services.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ When using Restate Cloud, you can run your services anywhere: on Kubernetes, as
The only requirement is that your services need to be reachable from Restate Cloud's infrastructure.

You can connect your services to Restate Cloud in several ways, depending on where they run:
- Connect [Kubernetes services](/cloud/connecting-services#connecting-kubernetes-services) via a secure tunnel with Restate Operator.
- Connect [Kubernetes services](/cloud/connecting-services#kubernetes-services) via a secure tunnel with Restate Operator.
- Connect [serverless functions (Vercel, Cloudflare Workers, Deno Deploy, etc.) or other public endpoints](/cloud/connecting-services#serverless-functions-and-other-public-endpoints), by signing requests with your cloud environment's public key.
- Connect [AWS Lambda functions](#connecting-aws-lambda-services), by granting Restate Cloud permission to assume a role in your AWS account.
- Connect [services in private environments](#connecting-services-in-private-environments), by setting up a tunnel.
- Connect [AWS Lambda functions](#aws-lambda-functions), by granting Restate Cloud permission to assume a role in your AWS account.
- Connect [services in private environments](#services-in-private-environments), by setting up a tunnel.

If you prefer a video walkthrough, check out this webinar on getting started with cloud:

Expand Down
1 change: 1 addition & 0 deletions docs/docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,7 @@
"services/deploy/kubernetes",
"services/deploy/vercel",
"services/deploy/lambda",
"services/deploy/cloud-run",
"services/deploy/cloudflare-workers",
"services/deploy/deno-deploy",
"services/deploy/standalone"
Expand Down
50 changes: 16 additions & 34 deletions docs/guides/kafka-quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -73,20 +73,18 @@ This way, you can use Restate to process events in a lightweight, flexible, tran
</Step>
<Step title="Running Restate Server">

Now, let's start the Restate Server and let it know about the Kafka cluster via the configuration file.
Now, let's start the Restate Server:

Store the following configuration in a file named `restate.toml`:

```toml restate.toml
[[ingress.kafka-clusters]]
name = "my-cluster"
brokers = ["PLAINTEXT://localhost:9092"]
```shell
restate-server
```
</Step>
<Step title="Register the Kafka cluster">

Start the Restate Server from the same location as the configuration file:
Let the Restate Server know about the Kafka cluster by registering it via the CLI:

```shell
restate-server --config-file restate.toml
restate kafka-clusters create my-cluster bootstrap.servers=localhost:9092
```
</Step>
<Step title="Register the service">
Expand All @@ -101,19 +99,15 @@ This way, you can use Restate to process events in a lightweight, flexible, tran

Now, we need to make Restate subscribe to the Kafka topics and tell it where it should push the events that arrive on the topic.

Execute the following curl command to create a subscription, and invoke the handler for each event:
Execute the following command to create a subscription, and invoke the handler for each event:

```shell
curl localhost:9070/subscriptions --json '{
"source": "kafka://my-cluster/greetings",
"sink": "service://Greeter/greet",
"options": {"auto.offset.reset": "earliest"}
}'
restate subscriptions create kafka://my-cluster/greetings service://Greeter/greet auto.offset.reset=earliest
```

For Go, you need to capitalize the handler name: `service://Greeter/Greet`.

This curl command calls the Admin API of the Restate Server and tells it to invoke the `greet` handler of the `Greeter` service for each event that arrives on the `greetings` topic in the `my-cluster` Kafka cluster.
This command tells Restate to invoke the `greet` handler of the `Greeter` service for each event that arrives on the `greetings` topic in the `my-cluster` Kafka cluster.
</Step>
<Step title="Invoke the handler by publishing an event">

Expand Down Expand Up @@ -150,36 +144,24 @@ This way, you can use Restate to process events in a lightweight, flexible, tran
</Step>
<Step title="Cleanup: removing the subscription">

You can see the subscriptions that are active via the Admin API:
You can see the subscriptions that are active via the CLI:

```shell
curl localhost:9070/subscriptions
restate subscriptions list
```

Example output:
```json
{
"subscriptions": [
{
"id": "sub_11XHoawrCiWtv8kzhEyGtsR",
"source": "kafka://my-cluster/my-topic",
"sink": "service://Greeter/greet",
"options": {
"auto.offset.reset": "earliest",
"client.id": "restate",
"group.id": "sub_11XHoawrCiWtv8kzhEyGtsR"
}
}
]
}
```text
ID SOURCE SINK OPTIONS
sub_11XHoawrCiWtv8kzhEyGtsR kafka://my-cluster/greetings service://Greeter/greet 3
```

As you can see, subscriptions have an ID that starts with `sub_`.

Now you can use the subscription ID to delete the subscription:

```shell
curl -X DELETE localhost:9070/subscriptions/sub_11XHoawrCiWtv8kzhEyGtsR
restate subscriptions delete sub_11XHoawrCiWtv8kzhEyGtsR
```
</Step>
</Steps>
Expand Down
Binary file added docs/img/monitoring/tracing_invocation_spans.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/img/monitoring/tracing_span_tags.png
Binary file not shown.
Binary file removed docs/img/monitoring/tracing_tour.png
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -374,7 +374,7 @@ export default {

<Accordion title="Restate Cloud">
When using [Restate Cloud](https://restate.dev/cloud), your service must be accessible over the public internet so Restate can invoke it.
If you want to develop with a local service, you can expose it using our [tunnel](/cloud/connecting-services#connecting-services-in-private-environments) feature.
If you want to develop with a local service, you can expose it using our [tunnel](/cloud/connecting-services#services-in-private-environments) feature.
</Accordion>
</Step>
<Step title="Send a request to the Greeter service">
Expand Down
87 changes: 73 additions & 14 deletions docs/server/monitoring/tracing.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,10 @@ description: "Export OTEL traces of your invocations."

Restate supports the following tracing features:

* Runtime execution tracing per invocation
* Runtime execution tracing per invocation, exported in real time while the invocation is running
* Exporting traces to OTLP-compatible systems (e.g. Jaeger)
* Correlating parent traces of incoming HTTP requests, using the [W3C TraceContext](https://github.com/w3c/trace-context) specification.
* Correlating parent traces of incoming HTTP requests, using the [W3C TraceContext](https://github.com/w3c/trace-context) specification
* Propagating the trace context to your services, so spans created in your handlers join the same trace (see [End-to-end tracing with the SDKs](#end-to-end-tracing-with-the-sdks))

## Setting up OTLP exporter

Expand Down Expand Up @@ -38,7 +39,6 @@ restate-server --tracing-endpoint otlp+http://localhost:4318/v1/traces # for HTT
If you run Restate in Docker, then instead add the environment variable `-e RESTATE_TRACING_ENDPOINT=http://host.docker.internal:4317`.

If you now spin up your services and send requests to them, you will see the traces appear in the Jaeger UI at http://localhost:16686
<img src="/img/monitoring/tracing_tour.png"/>

<AccordionGroup>
<Accordion title="Specifying additional tracing headers">
Expand All @@ -64,27 +64,86 @@ If you now spin up your services and send requests to them, you will see the tra

You can import the trace files using the Jaeger UI:

<img src="/img/monitoring/jaeger-import-file.png"/>
<img src="/img/monitoring/jaeger-import-file.png" alt="Import trace files in the Jaeger UI"/>
</Accordion>
</AccordionGroup>

## Understanding traces
The traces contain detailed information about the context calls that were done during the invocation (e.g. sleep, one-way calls, interaction with state):

<img src="/img/monitoring/tracing_span_tags.png"/>
Restate traces represent what is physically happening during an invocation, while it is happening.
For every invocation, Restate emits the following spans:

The initial `ingress_invoke` spans show when the HTTP request was received by Restate. The `invoke` span beneath it shows when Restate invoked the service deployment to process the request.
| Span | Description |
|------|-------------|
| `ingress <target>` | The HTTP request was received by the Restate ingress. Emitted only for invocations made over the ingress. |
| `invocation-start <target>` | The invocation started. This is the anchor span: all other spans of the invocation are children of it. |
| `invocation-attempt <target>` | One span per invocation attempt, emitted as soon as the attempt ends. Attempts that fail with a retryable error are marked with error status, so you can spot retry loops at a glance. |
| `invocation-end <target>` | The invocation completed, recording whether it succeeded or failed. |

The tags of the spans contain the metadata of the context calls (e.g. call arguments, invocation id).
Spans are exported **as soon as they end**, not when the whole invocation completes.
This means you can inspect invocations that are still running, for example to debug an invocation stuck in a retry loop.

<Info title="One-way call traces are detached from the parent">
When a service invokes another service, the child invocation is linked automatically to the parent invocation, as you can see in the image.
Note that the spans of one-way calls are shown as separate traces. The parent invocation only shows that the one-way call was scheduled, not its entire tracing span.
To see this information, search for the trace of the one-way call by filtering on the invocation id tag `restate.invocation.id="inv_19maBIcE9uRD0gIu30mu6eqhZ4pQT"`.
<img src="/img/monitoring/tracing_invocation_spans.png" alt="Jaeger UI showing an invocation trace with the start span, multiple attempt spans (failed attempts marked in red), SDK spans, and the end span"/>

The example above shows an invocation that was retried a few times: each failed attempt is shown as a red `invocation-attempt` span, published right when the attempt failed.
The spans below each attempt are emitted by the service itself, using the [SDK tracing integrations](#end-to-end-tracing-with-the-sdks).

Operations performed by the handler (e.g. `ctx.run`, calls, sleeps, state access) are recorded as **events on the attempt span**, rather than as separate spans:

| Event | Description |
|-------|-------------|
| `restate.invocation.lifecycle.new_command` | The handler created a new journal command. The attributes `restate.journal.command.type` and `restate.journal.command.name` describe the command. |
| `restate.invocation.lifecycle.run_ended` | A `ctx.run` block finished executing. |
| `restate.invocation.lifecycle.suspended` | The invocation suspended, waiting for some condition (e.g. a timer, a call result). |
| `restate.invocation.lifecycle.yielded` | The invocation yielded the execution. |

<Info title="One-way call traces are linked to the parent">
When a handler calls another service, the child invocation joins the same trace.
One-way calls (send) are shown as separate traces instead, linked to the trace of the parent invocation.
To find them, search for the trace of the one-way call by filtering on the invocation id attribute `restate.invocation.id`.
</Info>

<Note>
Spans emitted by Restate are exported with the resource service name `Restate`, the process that physically produces them.
The logical, per-invocation view remains available in the Restate UI.
</Note>

### Span attributes

Restate spans carry the following attributes, which you can use to build dashboards, alerts, and queries:

| Attribute | Spans | Description |
|-----------|-------|-------------|
| `restate.invocation.id` | All | The invocation ID. |
| `restate.invocation.target` | All | The invocation target (e.g. `Greeter/greet` for services, `Greeter/myKey/greet` for keyed services). |
| `rpc.service` / `rpc.method` | All | The service name and the handler name. |
| `restate.deployment.id` | `invocation-attempt` | The ID of the deployment processing the attempt. |
| `restate.deployment.address` | `invocation-attempt` | The address of the deployment processing the attempt. |
| `restate.deployment.service_protocol_version` | `invocation-attempt` | The service protocol version used by the deployment. |
| `restate.invocation.result` | `invocation-end` | The invocation result: `success` or `failure`. |
| `restate.invocation.error.code` | `invocation-end` | The error code, if the invocation failed. |
| `error.message` | `invocation-end` | The error message, if the invocation failed. |

## End-to-end tracing with the SDKs

Restate propagates the [W3C TraceContext](https://github.com/w3c/trace-context) to your service on every invocation attempt.
The SDK tracing integrations use it to create a span per handler attempt, and a child span per `ctx.run` block, all joining the same trace:

* [TypeScript SDK tracing](/develop/ts/tracing)
* [Java/Kotlin SDK tracing](/develop/java/tracing)

The SDK spans carry the same `restate.invocation.id` and `restate.invocation.target` attributes as the runtime spans, so you can correlate them easily.
Spans created per `ctx.run` block carry the run name in the `restate.run.name` attribute.

Trace context propagation also works at the boundaries:

* **Upstream**: if the incoming HTTP request to the ingress carries a `traceparent` header, the invocation trace continues from it.
* **Downstream**: spans you create yourself inside your handlers (e.g. instrumented HTTP clients, database calls) attach to the trace of the current attempt.

## Searching traces

Traces export attributes and tags that correlate the trace with the service and/or invocation. For example, in the Jaeger UI, you can filter on the invocation id (`restate.invocation.id`) or any other tag:
Traces export attributes that correlate the trace with the service and/or invocation. For example, in the Jaeger UI, you can filter on the invocation id (`restate.invocation.id`) or any other attribute:

<img src="/img/monitoring/jaeger_invocationid_search_handler.png" width={"80%"} alt="Searching traces by invocation id in the Jaeger UI"/>

<img src="/img/monitoring/jaeger_invocationid_search_handler.png" width={"80%"}/>
This also lets you navigate between the Restate UI and your tracing system: copy the invocation ID from the Restate UI and search for it in your tracing system, or vice versa.
22 changes: 12 additions & 10 deletions docs/server/snapshots.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ This gives the snapshot repository time to replicate snapshots to other regions

Each partition leader runs a **durability tracker** that monitors:
- **Durable LSN**: The log position that has been flushed to local storage on each replica (partition store flush)
- **Archived LSN**: The log position of the latest published snapshot in the object store (or the oldest retained snapshot if `worker.snapshots.experimental-num-retained` is configured)
- **Archived LSN**: The log position of the latest published snapshot in the object store (or the oldest retained snapshot if `worker.snapshots.num-retained` is greater than 1)

Based on the configured durability mode, the tracker calculates the **durability point**: the LSN up to which the partition store state is considered safely persisted. Once determined:

Expand Down Expand Up @@ -262,27 +262,29 @@ There are three notable persistence-related attributes in `restatectl`'s partiti

- **Applied LSN** - the latest log record record applied by this processor
- **Durable LSN** - the log position of the latest partition store flushed to local node storage; by default processors optimize performance by relying on Bifrost for durability and only periodically flush partition store to disk
- **Archived LSN** - if a snapshot repository is configured, this LSN represents the latest published snapshot (or the oldest retained snapshot if `worker.snapshots.experimental-num-retained` is configured); this determines the log safe trim point in multi-node clusters
- **Archived LSN** - if a snapshot repository is configured, this LSN represents the latest published snapshot (or the oldest retained snapshot if `worker.snapshots.num-retained` is greater than 1); this determines the log safe trim point in multi-node clusters

### Snapshot retention

By default, Restate adds new snapshots without removing old ones. You can configure automatic pruning using the experimental `experimental-num-retained` option:
Restate automatically prunes old snapshots so that snapshot storage does not grow unboundedly. The number of snapshots to retain per partition is controlled by `worker.snapshots.num-retained`, which defaults to `1`:

```toml
[worker.snapshots]
experimental-num-retained = 1
num-retained = 1
```

This keeps only the most recent snapshot and automatically deletes older ones.

<Info>
This feature is only available in Restate v1.6 and newer. Only newly uploaded snapshots after the experimental feature was activated will be pruned. Existing snapshots predating the configuration change will not be affected.
</Info>
By default, Restate keeps exactly one snapshot per partition and deletes older snapshots once a newer one has been published. Set `num-retained` to a higher value for added resiliency against corrupted snapshots.

<Warning>
When `experimental-num-retained` is greater than 1, the archived LSN advances to the *oldest* retained snapshot rather than the latest. This delays log trimming and increases storage usage on log servers. For most deployments, `experimental-num-retained = 1` is recommended unless you need the ability to fall back to older snapshots.
Restate considers the archived LSN to be that of the *oldest* retained snapshot. When you retain multiple snapshots, you gain the ability to fall back to an older one in case the most recent is corrupted. However, be mindful that this causes an increase in storage usage on the log servers as log trimming will follow the oldest snapshot tracked by Restate. For most deployments, `num-retained = 1` (default) is recommended.
</Warning>

<Info title="Upgrading from Restate 1.6">
Snapshot retention was previously opt-in via the `experimental-num-retained` key. Any configuration that still sets `experimental-num-retained` must be renamed to `num-retained` before upgrading; Restate ignores the old key. No action is required for existing snapshot repositories: the first upload after the upgrade rewrites `latest.json` from V1 to V2 in place, and snapshot data files are unchanged. Rolling back to a Restate binary older than v1.6 is not supported once V2 pointers have been written; rolling back to v1.6.x remains safe.

Note that of the snapshots produced by the older version, only the most recent will be considered for pruning following the upgrade. Older snapshots must still be cleaned up by the user, just as they would have been under v1.6 and earlier.
</Info>


## Data Backups

Expand Down
Loading
Loading