Skip to content

bug: cache hits bypass the RAG and memory injection pipeline #1500

@yossiovadia

Description

@yossiovadia

Summary

Cache reads happen before RAG and memory injection in the request processing pipeline. When there's a cache hit, the response is returned immediately and RAG/memory retrieval never runs — even if the decision has RAG enabled or the user has conversation memory.

Severity: Low — this is a correctness issue, not a security issue. The user gets a valid (generic) answer, just not their personalized one.

Pipeline order (current)

runRequestPreRoutingStages():
  1. applyRateLimitAndCacheChecks()  →  handleCaching() — cache READ here
  2. executeRAGPlugin()              →  sets ctx.RAGRetrievedContext
  3. prepareRequestForModelRouting() →  handleMemoryRetrieval() — sets ctx.MemoryContext

If step 1 returns a cache hit, steps 2-3 never execute.

Impact

  • User with RAG-enabled decision gets a generic cached response instead of their document-augmented one
  • User with memory enabled gets a cached response without their conversation history
  • Only affects requests where a semantically similar query was previously cached from a non-personalized request

Possible fixes

  1. Move cache read after RAG/memory — check cache only after all context augmentation, using the full augmented query as the cache key
  2. Skip cache reads for decisions with RAG or memory enabled — simpler, but reduces cache effectiveness for those decisions
  3. Include a "personalization hash" in the cache key — hash of (has_rag, has_memory, user_id) so personalized and generic queries don't collide

Option 2 is probably the right tradeoff — decisions with RAG/memory enabled should not serve cached responses since the response depends on user-specific context.

Found during work on #1448.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

In progress

Relationships

None yet

Development

No branches or pull requests

Issue actions