Skip to content

Consistency model

Kessel is a distributed system with two distinct data stores: the inventory database holds resource data, and the authorization graph holds relationships between resources, users, and permissions. When you report a resource, the data is written to the inventory database first and then flows asynchronously to the authorization backend through a Change Data Capture (CDC) pipeline. This means Kessel is eventually consistent — there is always a window of time between a write and that write becoming visible to permission checks.

This is not a bug or a limitation to work around. It is a deliberate architectural choice. Decoupling the write path from the authorization path allows Kessel to accept resource reports at high throughput without blocking on the authorization backend. Most authorization checks tolerate small amounts of staleness, and the replication window is typically measured in hundreds of milliseconds.

Understanding this consistency model is essential for building integrations that behave correctly. This document explains how data flows through the system, what consistency guarantees are available, and how to choose the right strategy for your use case.

When you call ReportResource, Kessel writes to the inventory database first, then replicates changes asynchronously to the authorization graph. This decoupled design allows Kessel to accept resource reports at high throughput without blocking on the authorization backend.

The replication happens through an event-driven pipeline that ensures changes eventually reach the authorization system. Under normal conditions, replication completes in 100-500ms, but this can vary with load, network conditions, and the number of relationships per resource.

This is why there is always a window of time between writing a resource and that write becoming visible to permission checks. Most authorization checks tolerate small amounts of staleness, and the replication window is typically measured in hundreds of milliseconds.

When you perform a permission check, you choose how fresh the authorization data needs to be. Kessel provides six check methods:

  • Check — Basic permission check
  • CheckSelf — Permission check where the subject is the caller (determined from authentication context)
  • CheckForUpdate — Strongly consistent permission check intended for use before update/delete operations
  • CheckForUpdateBulk — Bulk strongly consistent permission checks
  • CheckBulk — Bulk permission checks for multiple resource-subject-relation combinations
  • CheckSelfBulk — Bulk permission checks where the subject is the caller for each item

The most important distinction is between Check and CheckForUpdate:

  • Check lets the caller choose a consistency mode (see below). If none is specified, it defaults to minimize_latency, which is fast but may return results based on slightly stale data. Use Check for read-only operations like loading a page or listing resources.

  • CheckForUpdate always uses full consistency. There is no option to change this. The authorization backend evaluates the check against the latest committed state, guaranteeing the result reflects all writes that have completed. Use CheckForUpdate before update or delete operations where acting on stale data could cause a conflict or security issue.

MethodConsistencyCaller controlUse when
CheckCaller’s choice (default: minimize_latency)FullReading, browsing, listing
CheckForUpdateAlways fully consistentNoneBefore modifying or deleting a resource

The same distinction applies to their bulk variants: CheckBulk supports caller-chosen consistency, while CheckForUpdateBulk always uses full consistency.

All check methods except CheckForUpdate and CheckForUpdateBulk support three consistency modes:

This is the default. The authorization backend serves the check from the nearest replica without waiting for any particular replication state. It provides the lowest latency but may return results based on slightly stale data.

Use this for dashboards, list pages, and any read-heavy path where a brief replication lag is acceptable.

The caller provides a consistency token obtained from a previous operation. Kessel guarantees that the check reflects at least the state represented by that token. If the replica has not yet caught up, the system waits until it does.

Use this when you need causal consistency between two operations — for example, ensuring a check reflects a specific tuple write you performed earlier.

The Inventory API looks up the consistency token stored in the resource’s ktn column in the inventory database. This token represents the last state that the Consumer successfully replicated to the authorization backend. The check is guaranteed to reflect at least this committed state.

Use this for critical authorization decisions where you need confidence that the check reflects the resource’s current state in the database, without requiring the caller to manage tokens.

ModeToken sourceFreshness guaranteeLatencyBest for
minimize_latencyNoneMay read stale dataLowestDashboards, list views, read-heavy paths
at_least_as_freshCaller-providedAt least the caller’s known stateMediumCausal consistency between operations
at_least_as_acknowledgedDatabase lookup (ktn)At least the last committed stateHigherCritical access decisions

Kessel uses consistency tokens to track replication state. A consistency token is an opaque value that represents a specific point in the authorization backend’s transaction log. You can think of it as a logical timestamp for the authorization graph.

When changes are replicated to the authorization backend, a consistency token is returned and stored with the resource in the inventory database. This token represents the last state successfully replicated.

For at_least_as_acknowledged: Kessel looks up the stored token for the target resource and ensures the check is evaluated against a state at least as recent as that token.

For at_least_as_fresh: The token comes from the caller instead of the database. This is useful when your application holds a token from a previous write and wants to ensure subsequent reads are causally consistent with that write.

By default, ReportResource returns as soon as the inventory database transaction commits. The caller does not wait for replication to the authorization backend. This means a permission check issued immediately after a report may not yet reflect the change.

For scenarios where you need the report and the authorization state to be synchronized before returning to the caller, Kessel supports a write visibility option called IMMEDIATE.

When IMMEDIATE mode is enabled, the write operation waits for the replication pipeline to complete before returning success to the caller. This guarantees that by the time the caller receives a successful response, the authorization graph reflects the reported resource.

To prevent requests from hanging indefinitely if the replication pipeline is slow or unavailable, IMMEDIATE mode includes circuit breaker protection that fails fast after detecting consecutive timeouts or errors.

IMMEDIATE mode adds the full replication latency to the request — typically 100-500ms on top of the base write latency. Use IMMEDIATE mode only when your application flow genuinely requires the authorization state to be updated before proceeding.

The right consistency mode depends on what your application is doing at the moment of the check:

User is browsing a dashboard or list page. Use minimize_latency. A stale result for a few hundred milliseconds is invisible to the user, and the lower latency makes the page feel responsive.

User just performed a write and is viewing the result. Use at_least_as_acknowledged or set write visibility to IMMEDIATE. The user expects to see the effect of their action. Showing stale data here creates confusion (“I just shared this document, why can’t my collaborator see it?”).

Your service is chaining operations. Use at_least_as_fresh with a token from the prior operation. This gives you causal ordering without paying for a database lookup on every check.

Your service is making a security-critical decision. Use at_least_as_acknowledged. The small additional latency is worth the guarantee that the check reflects the committed state.

If you choose not to use IMMEDIATE mode, consider these patterns:

  • Optimistic UI — assume the operation will succeed and update the UI immediately. Reconcile later if the backend state differs.
  • Processing indicator — show a brief “updating permissions…” state after a write, then refresh.
  • Client-side caching — after a successful write, cache the expected authorization state on the client and use it until the next full refresh.
  • Retry with backoff — for service-to-service flows where a check must reflect a recent write, retry the check with a short backoff (e.g., 100ms, 200ms, 400ms) rather than adding a fixed sleep(). This keeps latency low in the common case while handling the occasional slower replication. Prefer at_least_as_fresh with a consistency token from the write over blind retries when possible.

There are several constraints to be aware of when working with Kessel’s consistency model:

  • Bulk operations do not support at_least_as_acknowledged. CheckBulk and CheckSelfBulk span multiple resources, and looking up a consistency token per resource would be prohibitively expensive. These methods support minimize_latency or at_least_as_fresh only. (CheckForUpdateBulk is not affected by this limitation because it always uses full consistency, as described above.)

  • Multi-resource scenarios require care. If your operation touches multiple resources, each resource has its own consistency token. There is no single token that covers all of them. For cross-resource consistency, use at_least_as_fresh with a token that covers the latest write across all resources involved.

  • IMMEDIATE mode requires the replication pipeline to be operational. If the replication pipeline is unavailable or slow, IMMEDIATE mode will trigger the circuit breaker and fail fast.

  • Replication lag is not bounded. Under extreme load or component failure, the replication pipeline can fall behind. Monitor replication metrics to detect and respond to these situations. See the monitoring guide for recommended metrics.