Summary
Cache reads happen before RAG and memory injection in the request processing pipeline. When there's a cache hit, the response is returned immediately and RAG/memory retrieval never runs — even if the decision has RAG enabled or the user has conversation memory.
Severity: Low — this is a correctness issue, not a security issue. The user gets a valid (generic) answer, just not their personalized one.
Pipeline order (current)
runRequestPreRoutingStages():
1. applyRateLimitAndCacheChecks() → handleCaching() — cache READ here
2. executeRAGPlugin() → sets ctx.RAGRetrievedContext
3. prepareRequestForModelRouting() → handleMemoryRetrieval() — sets ctx.MemoryContext
If step 1 returns a cache hit, steps 2-3 never execute.
Impact
- User with RAG-enabled decision gets a generic cached response instead of their document-augmented one
- User with memory enabled gets a cached response without their conversation history
- Only affects requests where a semantically similar query was previously cached from a non-personalized request
Possible fixes
- Move cache read after RAG/memory — check cache only after all context augmentation, using the full augmented query as the cache key
- Skip cache reads for decisions with RAG or memory enabled — simpler, but reduces cache effectiveness for those decisions
- Include a "personalization hash" in the cache key — hash of (has_rag, has_memory, user_id) so personalized and generic queries don't collide
Option 2 is probably the right tradeoff — decisions with RAG/memory enabled should not serve cached responses since the response depends on user-specific context.
Found during work on #1448.
Summary
Cache reads happen before RAG and memory injection in the request processing pipeline. When there's a cache hit, the response is returned immediately and RAG/memory retrieval never runs — even if the decision has RAG enabled or the user has conversation memory.
Severity: Low — this is a correctness issue, not a security issue. The user gets a valid (generic) answer, just not their personalized one.
Pipeline order (current)
If step 1 returns a cache hit, steps 2-3 never execute.
Impact
Possible fixes
Option 2 is probably the right tradeoff — decisions with RAG/memory enabled should not serve cached responses since the response depends on user-specific context.
Found during work on #1448.