Engineering

Multi-Tenant Agent Memory: What We Got Wrong the First Time

By CoreCast AI Team • March 24, 2026 • 10 min read

Architecture diagram showing multi-tenant memory isolation with per-tenant and per-user partitioning

Building agent memory for a single user is comparatively simple. Building it for a multi-tenant SaaS product — where hundreds of customers have their own users, their own agents, and their own data — adds a layer of complexity that most early implementations underestimate badly. We made three specific architectural mistakes in our first version of multi-tenant memory support. All of them were fixable, none of them were cheap to fix, and all of them could have been avoided with better upfront design. This is the post we wish we'd read before building.

Mistake 1: Query-Time Isolation Instead of Structural Isolation

Our first multi-tenant implementation used a shared memory store with tenant identifiers on each record. Isolation was enforced by adding a tenant filter to every query. This is the most common approach because it's the easiest to implement — one store, one index, tenant filtering as a WHERE clause equivalent.

The problem is that query-time filtering is a policy enforced in application code, not in the storage layer. Every isolation guarantee depends on every query correctly applying the tenant filter. Application bugs, misconfigured retrieval paths, or edge cases in new features can silently return cross-tenant data without any infrastructure-level safeguard. In a memory store containing sensitive user conversations, this is not an acceptable risk.

The fix was migrating to per-tenant partitioning — each tenant's memories live in a separate namespace, index shard, or physical partition depending on the storage layer. A query physically cannot return cross-tenant results because the execution scope is bounded to the tenant's partition from the query plan. Isolation is structural, not policy-dependent.

The migration was painful — re-indexing all existing memories into partitioned stores while maintaining zero-downtime serving. If you're building multi-tenant memory today, start with structural partitioning. The storage overhead is justified by the security properties, and retrofitting it later is far more expensive than building it correctly at the start.

Mistake 2: Flat Memory Scope

Our first model had two scopes: tenant-level and user-level. Tenant-level memories were shared across all users of that tenant. User-level memories were private to an individual. This seemed sufficient until we started supporting enterprise customers with more complex organizational structures.

Enterprise customers needed team-level memory — memories shared within a department but not across the whole organization. They needed project-level memory — context tied to a specific product or initiative, visible to everyone working on that project. They needed role-level memory — information that should be visible to managers but not individual contributors. The flat two-scope model couldn't represent any of this without hacking workarounds that became impossible to maintain.

The redesigned model uses a hierarchical scope tree: organization at the root, then teams, then projects, then users. Memories are tagged with their scope level and a scope identifier. Retrieval walks up the scope tree from the current user — user-level memories first, then team, then project, then org-level — and merges results with appropriate de-duplication. Enterprise administrators can configure which scope levels are available and how they map to their organizational structure.

This model adds complexity, but it's the right abstraction for real enterprise deployments. Building it from two flat scopes was a false economy — we saved a few weeks of design work and spent months rebuilding.

Mistake 3: No Per-Tenant Retention Controls

Our initial implementation had one retention policy: memories are kept until the user or tenant explicitly deletes them. This turned out to be operationally and legally insufficient almost immediately.

Healthcare customers needed automatic deletion of conversation memories after a configurable period — their compliance requirements specified maximum retention windows for patient-related interactions. Financial services customers needed audit-compliant retention — memories couldn't be deleted before a minimum retention period, even by users. Consumer-facing customers wanted to give users granular control over what was stored and for how long. The enterprise customers were asking for self-hosted deployments specifically so they could control the retention policy at the infrastructure level.

None of this is possible without per-tenant retention configuration built into the memory layer. We had to add retention policies as a first-class system primitive — configurable retention windows per scope level, minimum retention floors for compliance-constrained industries, user-level controls that the tenant administrator can enable or restrict, and automated deletion workflows with audit logs.

Retention policy is not a nice-to-have for multi-tenant memory. Any product targeting enterprise or regulated industries needs it, and the conversation about compliance requirements comes in the first sales call. Build it before you have enterprise customers, not after.

What the Correct Architecture Looks Like

The version we have today reflects the lessons from all three mistakes. Tenant partitioning is structural — memories are physically separated at the storage layer, cross-tenant queries are impossible by construction. Scope hierarchy is flexible — organizations configure the scope tree that matches their structure. Retention is per-tenant configurable with sensible defaults and compliance-specific options.

CoreCast's row-level isolation feature exposes this as a developer-facing API. You define the scope when writing a memory, specify retention metadata when the memory is created, and the system handles partitioning, scope-aware retrieval, and retention enforcement without additional application logic. The mistakes we made building this are built into the design as safeguards — developers using the SDK can't accidentally create the query-time isolation vulnerability or the flat-scope limitation that cost us so much to fix.

CoreCast's row-level isolation gives your multi-tenant agent the structural memory separation you need — without rebuilding it twice.

Start Building or Back to Blog