AI Virtual Memory - Roadmap

Treating AI context like operating system memory: pages, working sets, faults, and swapping.

Why This Isn't RAG

Capability	RAG / Memori	AI Virtual Memory
Page identity & lifecycle	❌ Chunks are anonymous	✅ Pages have IDs, versions, lineage
Multi-resolution per page	❌ Single embedding	✅ Full → Summary → Caption → Reference
Working set semantics	❌ Top-K retrieval	✅ Explicit hot/warm/cold tracking
Deterministic eviction	❌ Implicit	✅ Policy-driven (LRU, importance-weighted)
Coherency guarantees	❌ None	✅ Dirty tracking, flush, checkpoint consistency
Copy-on-write	❌ Impossible	✅ Fork memory state efficiently
Cross-modal causality	❌ Separate indexes	✅ Unified page model, cross-references
Checkpoint consistency	❌ None	✅ Point-in-time snapshots
Tools as memory writers	❌ External	✅ First-class page creators/mutators
Streaming fault handling	❌ Block or fail	✅ Speculative placeholders, async upgrade
Distinct content types	❌ Everything is chunks	✅ Transcript vs derived vs claims
Provenance tracking	❌ Lost on retrieval	✅ Claims cite source pages

Core difference in one sentence:

RAG optimizes retrieval. AI Virtual Memory virtualizes memory.

Current Status (v0.11)

Shipped February 2025. Full VM infrastructure with architecture cleanup: typed Pydantic models throughout, structured exceptions, DictCompatModel for backward compat, from __future__ import annotations everywhere, full observability with session-scoped logging. 1719 tests passing, CI enforces ≥ 80% coverage.

What's Shipped

Component	Module	Status
MemoryPage, PageTableEntry, all enums	`models/`	v0.8
PageType taxonomy + provenance	`models/`	v0.8
FaultPolicy, FaultReason, FaultRecord	`models/`	v0.8
MemoryABI, PageManifestEntry	`models/`	v0.8
UserExperienceMetrics	`models/`	v0.8
PageTable with dirty/tier/modality/type tracking	`page_table.py`	v0.8
PageTLB + TLBWithPageTable	`tlb.py`	v0.8
WorkingSetManager + PinnedSet + AntiThrash	`working_set.py`	v0.8
ContextPacker (cross-modal formatters)	`context_packer.py`	v0.8
ManifestBuilder + VMManifest	`manifest.py`	v0.8
PageFaultHandler + PageSearchHandler	`fault_handler.py`	v0.8
ArtifactsBridge (chuk-artifacts + fallback)	`artifacts_bridge.py`	v0.8
MutationLogLite (append-only, replay)	`mutation_log.py`	v0.8
SimplePrefetcher (heuristic-based)	`prefetcher.py`	v0.8
ContextPackCache (LRU)	`pack_cache.py`	v0.8
VM prompts (strict/relaxed/passive) + tools	`vm_prompts.py`	v0.8
MemoryManager orchestrator	`manager.py`	v0.9
SessionManager VM integration (`enable_vm`)	`session_manager.py`	v0.9
Event-to-Page mapper (auto page creation)	`session_manager.py`	v0.9
Segmentation hook (evict on rollover)	`session_manager.py`	v0.9
VM-aware prompt builder (`VM:CONTEXT`)	`session_manager.py`	v0.9
DemandPagingPrePass (recall + prefetch)	`demand_paging.py`	v0.9
EvictionPolicy protocol (3 implementations)	`eviction_policy.py`	v0.10
CompressorRegistry (text, image, passthrough)	`compressor.py`	v0.10
Compress-before-evict in MemoryManager	`manager.py`	v0.10
VMMetrics compression tracking	`models/`	v0.10
Typed stats models (VMContext, MemoryManagerStats, etc.)	`models/stats.py`	v0.11
Structured exceptions (StorageError, InvalidSessionOperation)	`exceptions.py`	v0.11
DictCompatModel backward-compat base	`base_models.py`	v0.11
Session-scoped logging in MemoryManager	`manager.py`	v0.11
`_default_summary()` extraction (separation of concerns)	`session_manager.py`	v0.11

What's Next

Priority	Component	Phase
P1	StreamingFaultHandler	v0.12
P1	Real multi-modal handlers	v0.12
P2	Memory pressure monitoring	v0.12
P2	Agent Planning & Skills (mcp-cli)	v0.12
P3	Coherency, versioning, time-travel	v0.13
P3	Advanced (shared memory, COW, GC, NUMA)	v1.0+

Vision

Just as operating systems virtualize physical memory to give processes the illusion of infinite address space, AI Virtual Memory virtualizes context windows to give AI conversations the illusion of infinite memory across all modalities (text, images, audio, video).

┌────────────────────────────────────────────────────────────────┐
│                     APPLICATION LAYER                          │
│         AI Agent / Chat / Tool Orchestration                   │
└────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌────────────────────────────────────────────────────────────────┐
│                   VIRTUAL MEMORY LAYER                         │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐ │
│  │ Page Table   │  │ Working Set  │  │ Fault Handler        │ │
│  │              │  │ Manager      │  │                      │ │
│  └──────────────┘  └──────────────┘  └──────────────────────┘ │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐ │
│  │ Compressor   │  │ Eviction     │  │ Prefetch             │ │
│  │ (per-modal)  │  │ Policy       │  │ Manager              │ │
│  └──────────────┘  └──────────────┘  └──────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌────────────────────────────────────────────────────────────────┐
│                    STORAGE HIERARCHY                           │
│                                                                │
│   L0 (Registers)  │  Current prompt context window            │
│   L1 (Cache)      │  Recent pages, hot data (session state)   │
│   L2 (RAM)        │  Session storage (chuk-sessions)          │
│   L3 (Disk)       │  Artifact storage (chuk-artifacts)        │
│   L4 (Cold)       │  Archive/S3 (chuk-artifacts vfs-s3)       │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Storage Hierarchy Detail

Tier	Name	Latency	Capacity	Content	Implementation
L0	Registers	0ms	~128K tokens	Active prompt window	In-memory list
L1	Cache	<10ms	~1M tokens	Derived materializations (summaries, claims, packed context)	Session.state
L2	RAM	<100ms	~100M tokens	Authoritative event store (append-only session events, tool results)	chuk-sessions
L3	Disk	<1s	~10GB	Artifacts, media, checkpoints	chuk-artifacts (vfs-filesystem)
L4	Cold	1-10s	Unlimited	Archives, old sessions	chuk-artifacts (vfs-s3)

L1/L2 Contract (Critical)

L2 is source-of-truth. Append-only event log of everything that happened:

User messages, assistant responses, tool calls/results
Never mutated, only appended
Coherency is guaranteed because it's immutable

L1 is derived. Materialized views that can be rebuilt from L2:

Summaries, extracted claims, packed context
Dirty tracking is about L1 pages that haven't been persisted yet
If L1 is lost, rebuild from L2

This makes coherency tractable: L2 is the log, L1 is the cache.

Phase 1: Foundations (v0.8) — SHIPPED

1.1 Page Type Taxonomy

Not all pages are equal. Different page types have different eviction/compression rules:

Page Type	Description	Eviction Priority	Compression
`transcript`	Raw turns, tool outputs	Normal	Full → Summary → Reference
`summary`	LLM-generated summaries	Low (derived, rebuildable)	Already compressed
`artifact`	Tool-created content	Normal	Type-dependent
`claim`	Decisions, facts, conclusions	Very Low (high-value, referenced constantly)	Rarely compressed
`procedure`	"When calling tool X, we do Y"	Low	Rarely compressed
`index`	Page metadata for search	Very Low	Never compressed

Why claim pages matter: Without first-class claims, high-value decisions get buried in summaries and you pay fault/compression costs repeatedly. Claims are token-cheap, high-importance, and have explicit provenance back to source pages.

1.2 MemoryPage Model

Core abstraction representing any piece of content across modalities.

class MemoryPage(BaseModel):
    page_id: str
    modality: Modality = Modality.TEXT
    storage_tier: StorageTier = StorageTier.L0

    # Page type (critical for eviction/compression decisions)
    page_type: PageType = PageType.TRANSCRIPT

    # Provenance: what pages justify this one (for claims, summaries)
    provenance: List[str] = []  # page_ids that this page derives from

    # Representation linking (for compression chain)
    represents: Optional[str] = None  # page_id this is a compressed version of
    representation_level: CompressionLevel = CompressionLevel.FULL

    # Content or reference
    content: Optional[Any] = None
    artifact_id: Optional[str] = None

    # Multi-resolution representations (artifact_ids for each level)
    representations: Dict[int, str] = {}  # level -> artifact_id

    # Access tracking
    size_bytes: int
    size_tokens: Optional[int] = None
    created_at: datetime
    last_accessed: datetime
    access_count: int = 0
    importance: float = 0.5

    # Modality-specific metadata
    mime_type: str
    duration_seconds: Optional[float] = None  # audio/video
    dimensions: Optional[Tuple[int, int]] = None  # image/video

Claim Page Example

# When user says "Let's use PostgreSQL for the database"
claim_page = MemoryPage(
    page_id="claim_db_choice_001",
    page_type=PageType.CLAIM,
    modality=Modality.TEXT,
    content="Decision: Use PostgreSQL for the database",
    provenance=["msg_042", "msg_043"],  # The messages where this was decided
    importance=0.95,  # High importance = low eviction priority
    size_tokens=12,   # Token-cheap
)

Procedure Page Example

# Learned pattern: "When calling weather_tool, we usually load location claims first"
procedure_page = MemoryPage(
    page_id="proc_weather_pattern_001",
    page_type=PageType.PROCEDURE,
    modality=Modality.STRUCTURED,
    content={
        "tool": "weather_tool",
        "prereqs": ["claim_location_*"],
        "typical_pages_read": ["msg_user_location", "claim_timezone"],
    },
    provenance=["tool_trace_087", "tool_trace_092"],
    importance=0.8,
)

1.3 PageTableEntry (with dirty tracking)

Core metadata for each page - includes dirty bit from day one.

class PageTableEntry(BaseModel):
    page_id: str
    tier: StorageTier
    artifact_id: Optional[str] = None
    compression_level: CompressionLevel = CompressionLevel.FULL

    # Dirty tracking (critical for coherency)
    dirty: bool = False
    last_flushed: Optional[datetime] = None

    # Access tracking
    last_accessed: datetime
    access_count: int = 0

    # Locality hints (for future NUMA awareness)
    affinity: Affinity = Affinity.LOCAL

1.4 PageTable

Maps page IDs to their current location and state.

class PageTable:
    pages: Dict[str, PageTableEntry]

    async def lookup(self, page_id: str) -> PageTableEntry
    async def register(self, page: MemoryPage) -> None
    async def update_location(self, page_id: str, tier: str, artifact_id: str) -> None
    async def mark_accessed(self, page_id: str) -> None
    async def mark_dirty(self, page_id: str) -> None
    async def get_dirty_pages(self) -> List[PageTableEntry]
    async def get_by_tier(self, tier: str) -> List[PageTableEntry]

1.5 TLB (Translation Lookaside Buffer)

Fast cache for recent page lookups - avoids PageTable + storage hops.

class PageTLB:
    """
    Without this, fault rate metrics lie - you'll be bottlenecked
    on metadata lookups, not content.
    """
    max_entries: int = 512
    entries: Dict[str, PageTableEntry]
    lru: OrderedDict  # For eviction

    def lookup(self, page_id: str) -> Optional[PageTableEntry]:
        """O(1) lookup, updates LRU on hit"""

    def insert(self, entry: PageTableEntry) -> None:
        """Insert with LRU eviction if full"""

    def invalidate(self, page_id: str) -> None:
        """Remove entry (on page mutation/eviction)"""

    def flush(self) -> None:
        """Clear entire TLB (on checkpoint/context switch)"""

1.6 Compression Levels

Define standard compression levels per modality.

Level	Text	Image	Audio	Video
0 (Full)	Complete messages	Full resolution	Full waveform	All frames
1 (Reduced)	Recent + summaries	Thumbnail	Transcript + timestamps	Keyframes + transcript
2 (Abstract)	Key points only	Caption + embedding	Summary + key quotes	Scene descriptions
3 (Reference)	Topic tags	"image:{id}"	Duration + topic	"video:{id}"

1.7 Context Pack Cache

The TLB caches address translations. But in LLM terms, the expensive part is often:

Selecting pages
Choosing compression levels
Rendering into VM:CONTEXT

Add a cache for the packed context output:

class ContextPackCache:
    """
    Cache packed context to avoid re-packing on small incremental turns.
    Invalidate on working set changes.
    """
    max_entries: int = 32

    def get(
        self,
        session_id: str,
        model_id: str,
        token_budget: int,
        working_set_hash: str,  # Hash of page_ids + versions in working set
    ) -> Optional[PackedContext]:
        """O(1) lookup for cached pack"""

    def put(
        self,
        session_id: str,
        model_id: str,
        token_budget: int,
        working_set_hash: str,
        packed: PackedContext,
    ) -> None:
        """Store packed context, evict LRU if full"""

    def invalidate_session(self, session_id: str) -> None:
        """Invalidate all cached packs for session (on working set change)"""

This drastically reduces overhead for "small incremental turns" - user says something short, assistant responds, no pages changed, reuse the pack.

1.8 Deliverables

MemoryPage model with page_type and provenance in memory/models.py
PageTableEntry with dirty tracking
PageTable class in memory/page_table.py
PageTLB class in memory/tlb.py
ContextPackCache class in memory/pack_cache.py
CompressionLevel enum and per-modality schemas
Unit tests for page lifecycle

Phase 2: Working Set Management (v0.9) — SHIPPED

2.1 WorkingSetManager

Tracks which pages are "hot" and manages L0/L1 capacity.

class WorkingSetManager:
    max_l0_tokens: int  # Context window limit
    max_l1_pages: int   # Cache size

    async def add_to_working_set(self, page: MemoryPage) -> None
    async def get_working_set(self) -> List[MemoryPage]
    async def get_token_usage(self) -> TokenBudget
    async def needs_eviction(self) -> bool
    async def get_eviction_candidates(self) -> List[MemoryPage]

2.2 Token Budget

Track token allocation across modalities.

class TokenBudget:
    total_limit: int
    text_tokens: int
    image_tokens: int  # Estimated token cost
    audio_tokens: int  # Transcript tokens
    reserved: int      # System prompt, tools
    available: int

2.3 Pinned Pages

Some pages should never be evicted. These form the "stable context" the model relies on.

class PinnedSet:
    """
    Pages that are never evicted from working set.
    Pinning prevents thrash on critical context.
    """
    pinned: Set[str]  # page_ids

    # Auto-pinned by default:
    # - System prompt page
    # - Active goal/plan page
    # - User preferences page
    # - Current tool schemas (see ABI)
    # - Last N turns (configurable, typically 2-4)

    def pin(self, page_id: str) -> None
    def unpin(self, page_id: str) -> None
    def is_pinned(self, page_id: str) -> bool

Default pinned set (configurable):

System prompt
Active plan/goals
User preferences
Last 3 turns (user + assistant pairs)
Current tool schemas

2.4 Anti-Thrash Protection

OS working sets fail when you thrash. LLM working sets thrash when user toggles between topics.

class AntiThrashPolicy:
    """
    Prevent evicting pages that were just faulted in.
    """
    # Recently-evicted pages get a "do not evict again" window
    eviction_cooldown_turns: int = 3

    # Recently-faulted pages get temporary protection
    fault_protection_turns: int = 2

    # Track eviction/fault history
    eviction_history: Dict[str, int]  # page_id -> turn_evicted
    fault_history: Dict[str, int]     # page_id -> turn_faulted

    def can_evict(self, page_id: str, current_turn: int) -> bool:
        """Check if page is in cooldown period"""
        if page_id in self.fault_history:
            if current_turn - self.fault_history[page_id] < self.fault_protection_turns:
                return False
        return True

    def get_eviction_penalty(self, page_id: str, current_turn: int) -> float:
        """Higher penalty = less likely to evict"""
        # Recently faulted = high penalty
        # Recently evicted = high penalty (avoid re-evicting)

This is the difference between "looks good on paper" and "feels stable in chat".

2.5 Access Pattern Tracking

LRU tracking per page
Frequency counting (LFU hybrid)
Importance scoring (user-marked, referenced by tools)
Topic affinity: Pages accessed together get co-eviction bonus

2.6 Simple Prefetch Heuristics (v0.9)

Don't wait for ML prediction. Start with dumb heuristics that work:

class SimplePrefetcher:
    """
    Basic prefetch that doesn't need prediction models.
    Good enough for v0.9.
    """
    async def prefetch_on_turn_start(self, session_id: str) -> List[str]:
        pages_to_prefetch = []

        # 1. Last segment summary (almost always needed for "what did we discuss")
        pages_to_prefetch.append(await self.get_last_segment_summary(session_id))

        # 2. Most-referenced claim pages (high access_count claims)
        pages_to_prefetch.extend(await self.get_top_claims(session_id, limit=3))

        # 3. Tool traces for likely tools (based on recent tool usage)
        likely_tools = await self.get_likely_tools(session_id)
        for tool in likely_tools:
            pages_to_prefetch.extend(await self.get_tool_prereq_pages(tool))

        return pages_to_prefetch

2.7 Lite Mutation Log (v0.9)

Start event-sourcing early (not full time-travel, but enough to debug and replay):

class MutationLogLite:
    """
    Append-only log of page operations.
    Not full event-sourcing, but enough for:
    - Debugging: "what was in context for turn T?"
    - Replay: Reconstruct state for testing
    - Grounding story: Prove what the model saw
    """

    async def append(self, mutation: PageMutation) -> None:
        """Append mutation to log"""

    async def get_context_at_turn(self, session_id: str, turn: int) -> List[str]:
        """Replay: what page_ids were in L0 at turn T?"""

    async def get_history(self, page_id: str) -> List[PageMutation]:
        """All mutations for a page"""

class PageMutation(BaseModel):
    """Immutable record of a page change"""
    mutation_id: str
    page_id: str
    timestamp: datetime
    turn: int
    mutation_type: MutationType  # CREATE, FAULT_IN, EVICT, COMPRESS, PIN, UNPIN

    # Context at mutation time
    tier_before: Optional[StorageTier]
    tier_after: StorageTier

    # Who caused it
    actor: Actor  # USER, MODEL, TOOL, SYSTEM
    cause: Optional[str]  # "eviction_pressure", "page_fault", "explicit_request"

This is gold for debugging agent weirdness and supports the strict grounding story.

2.8 Deliverables

WorkingSetManager class with pinning support
PinnedSet management
AntiThrashPolicy implementation
SimplePrefetcher with basic heuristics
MutationLogLite for debugging/replay
TokenBudget model with modality breakdown
Access pattern tracking integration
Working set size configuration per model

Phase 3: Page Faults & Loading (v0.10) — SHIPPED (except StreamingFaultHandler)

3.1 PageFaultHandler

Handles requests for pages not in L0/L1.

class PageFaultHandler:
    async def handle_fault(self, page_id: str, target_level: int = 0) -> MemoryPage:
        """
        1. Lookup page location in PageTable
        2. Load from appropriate tier (L2/L3/L4)
        3. Decompress if needed
        4. Add to working set
        5. Trigger eviction if needed
        """

3.2 Streaming Fault Handling

Handle page faults during streaming responses without blocking.

class DeferredPage:
    """
    Placeholder for pages being loaded mid-stream.
    Like how browsers render HTML while images load.
    """
    page_id: str
    placeholder: str  # "Loading context about X..."
    compression_hint: int  # What level to load at
    future: asyncio.Future[MemoryPage]

    async def await_or_placeholder(self, timeout_ms: int = 100) -> Union[MemoryPage, str]:
        """Return page if ready, placeholder if still loading"""

class StreamingFaultHandler:
    pending: Dict[str, DeferredPage]

    async def request_async(self, page_id: str, urgency: str = "normal") -> DeferredPage:
        """
        Non-blocking fault request.
        Returns immediately with DeferredPage.
        Background task loads actual content.
        """

    async def upgrade_placeholder(self, page_id: str, page: MemoryPage) -> None:
        """
        Called when page loads - can inject into ongoing stream
        if model supports mid-stream context injection.
        """

3.3 Fault Policies

The real failure mode isn't "can't load a page" - it's the model spiraling into repeated faults because it's unsure.

class FaultPolicy:
    """
    Guardrails to prevent fault spirals and budget blowouts.
    """
    # Existing
    max_faults_per_turn: int = 3

    # NEW: Token budget for fault resolution
    max_fault_tokens_per_turn: int = 8192  # Don't let faults blow the budget

    # NEW: Confidence threshold - only fault if explicitly needed
    fault_confidence_threshold: Literal["explicit", "referenced", "speculative"] = "referenced"
    # explicit: Only fault when page_id is directly requested
    # referenced: Fault if page content is referenced/needed
    # speculative: Fault on potential relevance (aggressive)

    # NEW: Track fault intent for metrics
    def validate_fault(
        self,
        page_id: str,
        reason: FaultReason,
        tokens_used_this_turn: int,
    ) -> bool:
        """Check if fault is allowed under current policy"""
        if tokens_used_this_turn + estimated_tokens > self.max_fault_tokens_per_turn:
            return False
        if self.fault_confidence_threshold == "explicit" and reason != "user_requested":
            return False
        return True

class FaultReason(str, Enum):
    """Make fault intent explicit for measurement"""
    USER_REQUESTED_RECALL = "user_requested_recall"  # "What did we say about X?"
    RESOLVE_REFERENCE = "resolve_reference"          # Model references page_id
    TOOL_PREREQUISITE = "tool_prereq"                # Tool needs this page
    SPECULATIVE = "speculative"                      # Might be relevant

Fault Request with Intent

# Updated page_fault tool call
page_fault(
    page_id: str,
    target_level: int = 2,
    reason: FaultReason = "resolve_reference"  # NEW: explicit intent
)

This enables:

Measuring why faults happen, not just how many
Distinguishing good faults (user asked) from bad faults (model guessing)
Tuning policies based on fault reason success rates

3.4 Lazy Loading Strategies

On-demand: Load when explicitly referenced
Predictive: Prefetch based on access patterns
Contextual: Load related pages (same session segment)
Speculative: Start loading on reference detection, use placeholder if not ready

3.5 ArtifactsBridge

Integration layer with chuk-artifacts.

class ArtifactsBridge:
    store: ArtifactStore

    async def store_page(self, page: MemoryPage, tier: str) -> str  # Returns artifact_id
    async def load_page(self, artifact_id: str) -> MemoryPage
    async def stream_page(self, artifact_id: str) -> AsyncIterator[bytes]
    async def checkpoint_pages(self, page_ids: List[str], name: str) -> str

3.6 Deliverables

PageFaultHandler class
FaultPolicy with token budget and confidence threshold
FaultReason enum for intent tracking
StreamingFaultHandler with DeferredPage (deferred — model-dependent, not needed to prove concept)
ArtifactsBridge integration
Lazy loading configuration
Fault metrics and logging (with reason breakdown)

Phase 4: Eviction & Compression (v0.10) — SHIPPED

Status: Complete. Generic EvictionPolicy protocol with 3 implementations (ImportanceWeightedLRU, LRUEvictionPolicy, ModalityAwareLRU). Compressor protocol with TextCompressor, ImageCompressor, PassthroughCompressor, and CompressorRegistry. Compress-before-evict integrated into MemoryManager._run_eviction(). VMMetrics tracks compression stats. SessionManager passes through vm_eviction_policy and vm_compressor_registry.

4.1 EvictionPolicy

Decides which pages to move to lower tiers.

class EvictionPolicy(Protocol):
    async def select_victims(
        self,
        pages: List[MemoryPage],
        required_tokens: int
    ) -> List[MemoryPage]

class LRUEvictionPolicy(EvictionPolicy): ...
class ImportanceWeightedLRU(EvictionPolicy): ...
class ModalityAwareLRU(EvictionPolicy): ...  # Keep text, evict media first

4.2 Modality Compressors

Transform content to lower compression levels.

class Compressor(Protocol):
    modality: str
    async def compress(self, page: MemoryPage, target_level: int) -> MemoryPage

class TextCompressor(Compressor):
    async def compress(self, page, level):
        if level == 1: return await self.summarize(page)
        if level == 2: return await self.extract_key_points(page)
        if level == 3: return self.extract_topics(page)

class ImageCompressor(Compressor):
    async def compress(self, page, level):
        if level == 1: return await self.create_thumbnail(page)
        if level == 2: return await self.generate_caption(page)
        if level == 3: return self.create_reference(page)

class AudioCompressor(Compressor):
    async def compress(self, page, level):
        if level == 1: return await self.transcribe(page)
        if level == 2: return await self.summarize_transcript(page)
        if level == 3: return self.create_reference(page)

class VideoCompressor(Compressor):
    async def compress(self, page, level):
        if level == 1: return await self.extract_keyframes_and_transcript(page)
        if level == 2: return await self.generate_scene_descriptions(page)
        if level == 3: return self.create_reference(page)

4.3 Compression Triggers

Token pressure (approaching context limit)
Time-based (pages older than threshold)
Explicit (user/system request)

4.4 Deliverables

EvictionPolicy interface and implementations (ImportanceWeightedLRU, LRUEvictionPolicy, ModalityAwareLRU)
Modality-specific compressors (TextCompressor, ImageCompressor, PassthroughCompressor, CompressorRegistry)
Compression level transitions (FULL → REDUCED → ABSTRACT → REFERENCE, compress-before-evict)
Eviction metrics and logging (VMMetrics.record_compression(), mutation log COMPRESS type)

Phase 5: Multi-Modal Integration (v0.11) — PARTIAL

Status: ContextPacker has cross-modal formatters (text, image, audio, video, structured). MemoryABI model exists in models.py. What's missing: real modality handlers for LLM context building (actual base64 image injection, audio transcription pipelines, etc.), and cross-modal reference tracking.

5.1 Unified Content Addressing

Reference any content uniformly.

# Text reference
"Regarding what you said earlier..."  →  page_fault("msg_abc123")

# Image reference
"In that chart I showed you..."  →  page_fault("img_xyz789")

# Audio reference
"Remember when I mentioned..."  →  page_fault("audio_def456")

# Cross-modal
"The data from the spreadsheet matches the graph"  →  [page_fault("xlsx_..."), page_fault("img_...")]

5.2 Context Window Packing

Intelligently pack multi-modal content into context.

class ContextPacker:
    async def pack(
        self,
        pages: List[MemoryPage],
        token_budget: int,
        priorities: Dict[str, float]  # modality -> weight
    ) -> PackedContext:
        """
        1. Sort by importance * recency * priority_weight
        2. Select compression level per page to fit budget
        3. Return packed context with manifest
        """

5.3 Modality Handlers

Text: Direct inclusion or summary
Image: Inline (base64/URL) or caption
Audio: Transcript with timestamps
Video: Keyframes + transcript or description

5.4 Memory ABI (Application Binary Interface)

Explicit contract between memory system and models.

class MemoryABI(BaseModel):
    """
    Lets different models negotiate memory requirements.
    Smaller models survive with aggressive compression.
    Tool processors reason about memory cost.
    """
    # What's in context
    pages: List[PageManifestEntry]

    # Capabilities
    faults_allowed: bool = True  # Can model request pages mid-turn?
    upgrade_budget_tokens: int = 2048  # Tokens reserved for fault resolution

    # Constraints
    max_context_tokens: int
    reserved_tokens: int  # System prompt, etc.
    available_tokens: int

    # NEW: Tool schema budget (often the hidden token hog)
    tool_schema_tokens_reserved: int = 0  # Tokens consumed by tool definitions
    active_toolset_hash: Optional[str] = None  # For cache invalidation when tools change

    # Preferences
    modality_weights: Dict[str, float] = {
        "text": 1.0,
        "image": 0.8,
        "audio": 0.6,
        "video": 0.4
    }

class PageManifestEntry(BaseModel):
    page_id: str
    modality: str
    page_type: str  # NEW: transcript, summary, artifact, claim, procedure, index
    compression_level: int
    tokens: int
    importance: float
    provenance: List[str] = []  # NEW: source page_ids (for claims/summaries)
    can_evict: bool = True
    can_compress: bool = True

Tool Schema as Pinned Pages

In real agent loops, tool schemas are often the hidden token hog. The ABI makes this explicit:

# Tool schemas should be treated as quasi-pinned pages
# They're implicitly in context but consume budget

abi = MemoryABI(
    max_context_tokens=128_000,
    tool_schema_tokens_reserved=4_500,  # 15 tools @ ~300 tokens each
    reserved_tokens=2_000,              # System prompt
    # available_tokens = 128_000 - 4_500 - 2_000 = 121_500 for content
)

# When tools change, invalidate context pack cache
if new_toolset_hash != abi.active_toolset_hash:
    context_pack_cache.invalidate_session(session_id)

This enables:

Models to request specific pages by ID
Negotiation of compression levels
Cost-aware tool planning
Graceful degradation for smaller models

5.5 Deliverables

Unified page reference syntax
ContextPacker with multi-modal support (formatters for all modalities shipped in v0.8)
MemoryABI specification and serialization (model shipped in v0.8)
Modality handlers for LLM context building (real image/audio/video pipeline integration)
Cross-modal reference tracking

Phase 6: Prefetch & Prediction (v0.11) — PARTIAL

Status: SimplePrefetcher shipped in v0.8 with heuristic-based prefetch (summary + claims + tool prereqs). What's missing: ML-based prediction, background loading infrastructure, plugin system for custom strategies.

6.1 PrefetchManager

Proactively load pages likely to be needed.

class PrefetchManager:
    async def predict_needed_pages(self, context: ConversationContext) -> List[str]
    async def prefetch(self, page_ids: List[str]) -> None
    async def learn_from_access(self, accessed: str, context: ConversationContext) -> None

6.2 Prediction Strategies

Sequential: Next segment in session chain
Similarity: Pages with similar embeddings to current context
Tool-based: Pages referenced by likely tool calls
User pattern: Based on historical access patterns

6.3 Background Loading

Async prefetch during idle time
Priority queue for prefetch requests
Cancel prefetch on context switch

6.4 Deliverables

PrefetchManager class (SimplePrefetcher shipped in v0.8)
Prediction strategy plugins
Background loading infrastructure
Prefetch hit rate metrics (basic stats in SimplePrefetcher)

Phase 7: Memory Pressure & Health (v0.11)

7.1 MemoryPressureMonitor

Track and respond to memory pressure.

class MemoryPressureMonitor:
    async def get_pressure_level(self) -> PressureLevel  # LOW, MEDIUM, HIGH, CRITICAL
    async def get_metrics(self) -> MemoryMetrics
    def on_pressure_change(self, callback: Callable) -> None

class MemoryMetrics:
    l0_utilization: float  # 0-1
    l1_utilization: float
    pages_by_tier: Dict[str, int]
    evictions_per_minute: float
    fault_rate: float
    compression_ratio: float

7.2 Adaptive Thresholds

Adjust eviction aggressiveness based on fault rate
Tune compression levels based on retrieval patterns
Model-specific context window optimization

7.3 Health Checks

Consistency between PageTable and storage
Orphaned artifacts cleanup
Corruption detection

7.4 Deliverables

MemoryPressureMonitor class
Adaptive threshold tuning
Health check routines
Observability dashboard integration

Phase 8: Coherency & Versioning (v0.12)

8.1 Page Versioning

Track page mutations over time.

class PageVersion:
    version_id: str
    page_id: str
    timestamp: datetime
    compression_level: int
    artifact_id: str
    parent_version: Optional[str]

8.2 Coherency Guarantees

Read-after-write: Immediate visibility of writes
Checkpoint consistency: All pages at checkpoint are consistent
Cross-tier sync: Track dirty pages, flush on demand

8.3 Conflict Resolution

Last-write-wins for simple cases
Version vectors for distributed scenarios
User-resolvable conflicts for important content

8.4 Full Event-Sourced Mutations

Extends MutationLogLite (shipped in v0.8) with full time-travel capabilities.

class PageMutationFull(PageMutation):
    """Extended mutation with version tracking"""

    # What changed (extends lite version)
    previous_version: Optional[str]
    new_version: str
    delta: Optional[Dict[str, Any]]  # For partial updates

    # Extended actor info
    actor_id: Optional[str]  # tool_name, model_id, etc.

class MutationLog(MutationLogLite):
    """
    Full mutation log with time-travel.
    Extends MutationLogLite from v0.9.
    """

    async def replay_to(self, page_id: str, timestamp: datetime) -> MemoryPage:
        """Reconstruct page state at any point in time"""

    async def get_mutations_by_actor(self, actor_id: str) -> List[PageMutationFull]:
        """All mutations by a specific tool/model"""

    async def get_context_diff(self, turn_a: int, turn_b: int) -> ContextDiff:
        """What changed in context between turns?"""

This enables:

Time travel: Reconstruct page state at any point
Deterministic replay: Debug agent behavior exactly
Audit trail: Who changed what and why
Undo/redo: Revert to previous states

8.5 Deliverables

Phase 9: Tools as Memory Writers (v0.12)

Tools are first-class memory citizens - they create, read, and mutate pages.

9.1 Tool Memory Interface

Every tool can interact with memory.

class ToolMemoryContext:
    """Passed to tools during execution"""
    memory: MemoryManager

    # Read
    async def get_page(self, page_id: str) -> MemoryPage
    async def search_pages(self, query: str, modality: Optional[str] = None) -> List[MemoryPage]

    # Write
    async def create_page(self, content: Any, modality: str, **metadata) -> MemoryPage
    async def update_page(self, page_id: str, content: Any) -> MemoryPage
    async def annotate_page(self, page_id: str, annotation: str) -> None

    # Reference
    async def link_pages(self, source_id: str, target_id: str, relation: str) -> None

class ToolResult(BaseModel):
    """Extended tool result with memory effects"""
    value: Any
    pages_created: List[str] = []
    pages_read: List[str] = []
    pages_modified: List[str] = []

9.2 Tool-Created Pages

Tools naturally produce memory artifacts.

# Calculator tool creates structured result page
@tool
async def calculate(expression: str, ctx: ToolMemoryContext) -> ToolResult:
    result = eval_safe(expression)
    page = await ctx.create_page(
        content={"expression": expression, "result": result},
        modality="structured",
        importance=0.8,  # Computed values are important
        tags=["calculation", "numeric"]
    )
    return ToolResult(value=result, pages_created=[page.page_id])

# Image analysis tool creates caption page
@tool
async def analyze_image(image_page_id: str, ctx: ToolMemoryContext) -> ToolResult:
    image = await ctx.get_page(image_page_id)
    caption = await vision_model.caption(image.content)

    # Create L2 representation
    caption_page = await ctx.create_page(
        content=caption,
        modality="text",
        parent_page_id=image_page_id,
        compression_level=2  # This IS a compressed representation
    )

    # Link as representation
    await ctx.link_pages(image_page_id, caption_page.page_id, "has_caption")

    return ToolResult(
        value=caption,
        pages_read=[image_page_id],
        pages_created=[caption_page.page_id]
    )

9.3 Tool Learning from Memory

Procedural memory learns from tool-created pages.

class ProceduralMemoryIntegration:
    """Connect tool execution to procedural memory"""

    async def on_tool_complete(self, tool_name: str, result: ToolResult):
        # Record what pages the tool touched
        trace = ToolTrace(
            tool_name=tool_name,
            pages_read=result.pages_read,
            pages_created=result.pages_created,
            pages_modified=result.pages_modified,
            outcome="success" if result.value else "failure"
        )
        await self.procedural_memory.record(trace)

    async def suggest_pages_for_tool(self, tool_name: str) -> List[str]:
        """Based on past successful invocations, suggest relevant pages"""
        patterns = await self.procedural_memory.get_patterns(tool_name)
        return patterns.commonly_read_pages

9.4 Deliverables

ToolMemoryContext interface
Extended ToolResult with memory effects
Tool decorators for memory-aware tools
Procedural memory integration
Page lineage tracking (tool → page relationships)

Phase 10: Advanced Features (v1.0+)

10.1 Shared Memory

Pages shared across sessions/users.

class SharedMemoryRegion:
    region_id: str
    scope: Literal["session", "user", "sandbox", "global"]
    pages: List[str]
    access_control: AccessPolicy

10.2 Memory-Mapped Files

Direct artifact access without full loading.

class MemoryMappedArtifact:
    async def read_range(self, offset: int, length: int) -> bytes
    async def get_slice(self, start: int, end: int) -> MemoryPage

10.3 Copy-on-Write

Efficient forking of memory state.

async def fork_memory_state(source_session: str, target_session: str) -> None:
    """Create new session with COW references to source pages"""

10.4 Garbage Collection

Automatic cleanup of unreferenced pages.

class GarbageCollector:
    async def collect(self) -> GCStats
    async def mark_reachable(self, root_sessions: List[str]) -> Set[str]
    async def sweep_unreachable(self, reachable: Set[str]) -> int

10.5 Memory Isolation

Process-like isolation between contexts.

class MemoryNamespace:
    namespace_id: str
    page_table: PageTable
    working_set: WorkingSetManager
    # Isolated view of memory

10.6 NUMA Awareness (Locality Optimization)

For distributed deployments with multiple storage backends.

class LocalityManager:
    """
    Critical when artifacts live in S3 or across regions.
    Penalize cross-region page faults, bias eviction away from expensive fetches.
    """
    regions: Dict[str, StorageRegion]

    async def get_affinity(self, page_id: str) -> Affinity:
        """Determine where page lives relative to current compute"""

    async def estimate_fault_cost(self, page_id: str) -> FaultCost:
        """Latency + bandwidth + monetary cost to fetch"""

    async def suggest_replication(self, page_id: str) -> Optional[str]:
        """Should we replicate this page closer?"""

class FaultCost(BaseModel):
    latency_ms: int
    bandwidth_bytes: int
    monetary_cost_usd: float
    tier: str

class EvictionPolicyWithLocality(EvictionPolicy):
    """Factor in fetch cost when deciding what to evict"""

    async def select_victims(self, pages, required_tokens):
        # Prefer evicting pages that are cheap to re-fetch
        # Keep expensive remote pages longer
        scored = [
            (p, self.eviction_score(p) * (1 / await self.locality.estimate_fault_cost(p.page_id).latency_ms))
            for p in pages
        ]
        return sorted(scored, key=lambda x: x[1])[:required_tokens]

This becomes critical when:

Different MCP servers have different storage backends
Artifacts live in S3 (high latency)
Multi-region deployments
Hybrid cloud/local setups

Phase 11: Demand Paging Strategy (v0.9) — SHIPPED

Critical design decision: How does the system decide what to page in?

The page_fault tool exists and works. The open question is who initiates the fault and when. This is the difference between a system that technically pages and one that feels seamless.

11.1 Three Paging Modes

Mode	Initiator	Latency	Quality	Use Case
Model-initiated	Model calls `page_fault`	+1 round trip	High (model knows what it needs)	Complex recall, multi-hop reasoning
Runtime-initiated	Pre-pass heuristic scans user message	0 (prefetched)	Medium (heuristic may miss)	Conversational chat, "what did we say about X?"
Hybrid	Runtime prefetches likely pages, model faults for gaps	0 for common cases, +1 for misses	High	Production agents

11.2 Runtime Pre-Pass Heuristic

Before sending to the model, scan the user message for recall signals:

class DemandPagingPrePass:
    """
    Scan user message for signals that old context will be needed.
    Prefetch before the model even sees the message.
    """

    # Recall signal patterns
    RECALL_PATTERNS = [
        r"(as we|what did we|earlier|previously|remember when)",
        r"(that (decision|choice|plan|approach) (about|for|regarding))",
        r"(go back to|revisit|return to)",
    ]

    async def prefetch_for_message(
        self,
        user_message: str,
        page_table: PageTable,
        working_set: WorkingSetManager,
    ) -> List[str]:
        """
        Returns page_ids to prefetch into L0 before model invocation.
        """
        prefetch_ids = []

        # 1. Check for recall patterns
        if self._has_recall_signal(user_message):
            # Load claim pages (decisions are the most commonly recalled)
            claims = page_table.get_by_type(PageType.CLAIM)
            prefetch_ids.extend(c.page_id for c in claims[:5])

        # 2. Check for topic references
        topics = self._extract_topics(user_message)
        for topic in topics:
            matching = page_table.search_by_content_hint(topic)
            prefetch_ids.extend(m.page_id for m in matching[:2])

        # 3. Always prefetch last segment summary
        prefetch_ids.append(await self._get_last_segment_summary())

        return prefetch_ids

11.3 The "Swap File" Pattern

With infinite context + demand paging, the system becomes an LLM swap file:

Active conversation → L0 (context window)
Recent history      → L1 (session state cache)
Full transcript     → L2 (session events, source of truth)
Artifacts           → L3/L4 (disk/S3)

Summary = compressed swap
Claim = pinned swap page (never swaps out)
Prefetch = readahead

The summarization callback in _create_new_segment is literally writing to the swap file. The question is whether to make this explicit in the API or keep it transparent.

11.4 Deliverables

DemandPagingPrePass — user message scanner with recall heuristics
Configurable paging mode (model-initiated / runtime / hybrid)
Pre-pass integration into SessionManager.user_says()
Metrics: prefetch hit rate vs. model-initiated fault rate
Documentation: when to use which mode

Phase 12: Closed-Loop Learning (v0.10)

Key insight: Procedural memory and guards are currently parallel systems. Wire them together and you get genuine agent skill acquisition, not just pattern replay.

12.1 The Feedback Loop

┌─────────────────────────────────────────────────────────────┐
│                    CLOSED-LOOP LEARNING                       │
│                                                               │
│  ToolMemoryManager ──records──► Pattern DB                    │
│       ▲                              │                        │
│       │                              ▼                        │
│       │                     ProceduralContextFormatter         │
│       │                              │                        │
│       │                              ▼                        │
│       │                     Context injection                 │
│       │                              │                        │
│       │                              ▼                        │
│       │                     Model uses patterns               │
│       │                              │                        │
│       │                              ▼                        │
│   NEW: Guard ◄──────────── ToolStateManager                   │
│   Feedback                   │                                │
│       │                      ▼                                │
│       │              Detects: failure, runaway, ungrounded    │
│       │                      │                                │
│       └──────────────────────┘                                │
│                                                               │
│   "This tool sequence failed 3x → suggest alternative"        │
│   "Tool X always needs claim Y prefetched → auto-prefetch"    │
│   "Tool X → Y → Z pattern succeeds 90% → promote to skill"   │
│                                                               │
└─────────────────────────────────────────────────────────────┘

12.2 Guard-Informed Procedural Memory

class GuardFeedbackIntegration:
    """
    Feed guard outcomes back into procedural memory.
    Turns failures into learned avoidance patterns.
    """

    def __init__(
        self,
        tool_memory: ToolMemoryManager,
        tool_state: ToolStateManager,
    ):
        self.tool_memory = tool_memory
        self.tool_state = tool_state

    async def on_guard_trigger(
        self,
        guard_type: str,  # "runaway", "ungrounded", "budget"
        tool_name: str,
        context: Dict[str, Any],
    ):
        """Record guard triggers as negative patterns."""
        await self.tool_memory.record(
            tool_name=tool_name,
            arguments=context.get("arguments", {}),
            result=None,
            success=False,
            failure_reason=f"guard:{guard_type}",
            metadata={
                "guard_type": guard_type,
                "call_count": context.get("call_count"),
                "sequence": context.get("recent_sequence"),
            },
        )

    async def get_avoidance_patterns(self, tool_name: str) -> List[str]:
        """
        Generate warnings for the model based on failure history.
        'Tool X with args matching Y has failed 3 times due to Z.'
        """
        failures = await self.tool_memory.get_failures(tool_name)
        if len(failures) >= 3:
            return [self._format_avoidance(f) for f in failures[:3]]
        return []

12.3 Skill Promotion

When a tool sequence succeeds repeatedly, promote it to a named skill:

class SkillPromoter:
    """
    Detect stable tool patterns and promote them to named skills.
    'search → extract → summarize' used 10 times → Skill: 'research_topic'
    """
    promotion_threshold: int = 5  # Successes before promotion

    async def check_for_promotion(self, session_id: str) -> List[Skill]:
        patterns = await self.tool_memory.get_successful_sequences(
            min_occurrences=self.promotion_threshold
        )
        return [
            Skill(
                name=self._generate_name(p),
                tool_sequence=p.tools,
                typical_args=p.common_args,
                success_rate=p.success_rate,
                page_prereqs=p.commonly_read_pages,
            )
            for p in patterns
        ]

12.4 Memory-Aware Tool Context

Procedural memory informs prefetch: "Tool X always reads claim Y first."

class MemoryAwareToolContext:
    """
    Before executing a tool, prefetch pages it typically needs.
    Based on procedural memory of past successful invocations.
    """

    async def prepare_for_tool(self, tool_name: str) -> List[str]:
        patterns = await self.tool_memory.get_patterns(tool_name)
        prereq_pages = patterns.commonly_read_pages
        # Prefetch these into L0 before tool execution
        for page_id in prereq_pages:
            await self.working_set.ensure_in_l0(page_id)
        return prereq_pages

12.5 Deliverables

GuardFeedbackIntegration — wire guard triggers to procedural memory
SkillPromoter — detect and promote stable tool patterns
MemoryAwareToolContext — prefetch pages for tools based on learned patterns
Avoidance pattern injection into model context
Metrics: skill promotion rate, avoidance pattern effectiveness

Phase 13: Storage Query Layer (v0.10)

Problem: chuk-sessions is key-value. The VM layer needs richer query semantics.

13.1 The Query Gap

Current chuk-sessions operations:

get(session_id) → Session
put(session_id, session) → None
list() → List[session_id]

What the VM layer needs:

Query pages by PageType (all claims, all summaries)
Query by provenance chain (what pages derive from msg_042?)
Query by recency/importance score (top 5 most-accessed claims)
Query by content (pages mentioning "PostgreSQL")
Cross-session queries (all claims across a user's sessions)

13.2 Options

Approach	Pros	Cons
A: Lightweight index layer on K/V	No new dependencies, works with existing backends	Query performance limited, must maintain index consistency
B: Document store (e.g., SQLite)	Rich queries, ACID, well-understood	New dependency, migration complexity
C: Hybrid	K/V for hot path, index for queries	Complexity, two sources of truth risk

Recommended: Option A for v0.10, evolve to B if query complexity grows.

13.3 Lightweight Page Index

class PageIndex:
    """
    In-memory index over page metadata, rebuilt from PageTable on startup.
    Persisted to Session.state for fast recovery.
    """

    # Indexes
    _by_type: Dict[PageType, Set[str]]          # PageType → page_ids
    _by_provenance: Dict[str, Set[str]]          # source_page_id → derived_page_ids
    _by_importance: SortedList[Tuple[float, str]] # (importance, page_id) sorted
    _by_content_hint: Dict[str, Set[str]]        # keyword → page_ids

    def query_by_type(self, page_type: PageType) -> List[str]:
        """All pages of a given type."""

    def query_provenance_chain(self, page_id: str) -> List[str]:
        """Walk the provenance graph: what derives from this page?"""

    def query_top_by_importance(self, n: int, page_type: Optional[PageType] = None) -> List[str]:
        """Top N pages by importance, optionally filtered by type."""

    def query_by_content(self, query: str) -> List[str]:
        """Simple keyword match against content hints."""

    def rebuild_from_page_table(self, page_table: PageTable) -> None:
        """Rebuild all indexes from the authoritative PageTable."""

    def persist_to_session_state(self, session: Session) -> None:
        """Store index in Session.state['vm:page_index'] for fast recovery."""

13.4 Deliverables

PageIndex with type, provenance, importance, and content indexes
Index persistence to Session.state
Index rebuild from PageTable
Integration with PageSearchHandler
Cross-session index queries (stretch goal)

Phase 14: Agent Planning & Skills — mcp-cli Foundation (v0.12)

North star: The VM gives agents continuity. Plans and skills give agents intentionality. Together, they're the foundation for agents that persist, learn, and improve across sessions.

14.1 The Connection

┌─────────────────────────────────────────────────────────────┐
│                       mcp-cli                                │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐   │
│  │ Agent Loop   │  │ Plans        │  │ Skills           │   │
│  │              │  │              │  │                  │   │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────────┘   │
│         │                 │                 │               │
└─────────┼─────────────────┼─────────────────┼───────────────┘
          │                 │                 │
          ▼                 ▼                 ▼
┌─────────────────────────────────────────────────────────────┐
│              chuk-ai-session-manager                         │
│                                                              │
│  Plans ──────► Claim pages (pinned, high-importance)         │
│                "Goal: Build auth system"                     │
│                "Decision: Use JWT"                           │
│                "Status: In progress"                         │
│                                                              │
│  Skills ─────► Procedure pages + promoted patterns           │
│                "When user asks for code review, do X→Y→Z"    │
│                "Tool sequence: search→read→analyze"          │
│                                                              │
│  Continuity ─► VM working set + demand paging                │
│                Agent remembers across turns & restarts        │
│                Claims survive eviction                       │
│                                                              │
│  Learning ───► Procedural memory + guard feedback            │
│                Agent improves tool use over time              │
│                Failed patterns become avoidance rules         │
│                                                              │
└─────────────────────────────────────────────────────────────┘

14.2 Plans as Claim Pages

A plan is a set of claim pages with structure:

class PlanPage(MemoryPage):
    """
    A plan is a high-importance claim page that tracks goals and progress.
    Plans are auto-pinned — they survive eviction pressure.
    """
    page_type: Literal[PageType.CLAIM] = PageType.CLAIM

    # Plan-specific
    plan_status: Literal["active", "completed", "abandoned"] = "active"
    goals: List[str] = []
    decisions: List[str] = []  # Each references a claim page_id
    progress: float = 0.0     # 0-1

    # Auto-pin when active
    @property
    def importance(self) -> float:
        return 0.99 if self.plan_status == "active" else 0.5

Why this works: plans are just claim pages with extra structure. They get all the VM guarantees for free — pinning, anti-thrash, provenance tracking, demand paging.

14.3 Skills as Promoted Procedures

Skills are procedure pages that have been validated through the closed-loop learning system:

class SkillPage(MemoryPage):
    """
    A skill is a validated procedure page.
    Promoted from procedural memory when a pattern is stable.
    """
    page_type: Literal[PageType.PROCEDURE] = PageType.PROCEDURE

    # Skill-specific
    skill_name: str
    tool_sequence: List[str]
    success_rate: float
    invocation_count: int
    page_prereqs: List[str]  # Pages to prefetch before executing

    # Skills are injected into context when relevant tools are active
    @property
    def importance(self) -> float:
        return min(0.9, 0.5 + (self.success_rate * 0.4))

14.4 Agent Lifecycle

Session 1: Agent encounters task, creates plan (claim pages)
Session 2: Agent continues plan, makes decisions (more claims)
Session 3: Plan completes, tool patterns promoted to skills
Session 4: Agent uses skills from Session 3 on new tasks
           Claims from Session 1-2 available via demand paging

The VM makes this possible because:

Claims persist — decisions and plans survive across sessions
Procedures learn — tool patterns accumulate and get promoted
Demand paging — old context is always available, never truly lost
Provenance — every decision traces back to the conversation that created it

14.5 Deliverables

PlanPage model with goal/decision/progress tracking
SkillPage model with tool sequence and prereqs
Plan lifecycle (create → update → complete/abandon)
Skill injection into model context when relevant
Cross-session plan continuity via demand paging
mcp-cli integration layer (Plans API, Skills API)

Chat Completions VM Protocol

The VM layer maps cleanly onto standard Chat Completions API - no custom extensions required.

Core Insight

VM Concept	Chat Completions Mapping
Working set	Messages included in `messages[]`
Page table	`<VM:MANIFEST_JSON>` in developer message
Page fault	`page_fault` tool call
Page discovery	`search_pages` tool call
Page load	`role:"tool"` response with `tool_call_id`
Context window	What you choose to include

Design Principles

VM:MANIFEST_JSON is machine-readable (strict JSON) for deterministic parsing, diffing, replay
VM:CONTEXT is human-readable (compact transcript) for model comprehension
Hints are NOT evidence - only VM:CONTEXT and tool-returned content are evidence
Canonical envelope for all tool results enables consistent downstream processing

Mode 1: Passive VM (No Tools)

VM as developer metadata only - runtime handles paging transparently.

{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "developer",
      "content": "You are a helpful assistant.\n\n<VM:MANIFEST_JSON>\n{\"session_id\":\"sess_123\",\"working_set\":[{\"page_id\":\"msg_211\",\"modality\":\"text\",\"level\":0,\"tokens_est\":320},{\"page_id\":\"msg_212\",\"modality\":\"text\",\"level\":0,\"tokens_est\":180}],\"available_pages\":[{\"page_id\":\"summary_seg_01\",\"modality\":\"text\",\"tier\":\"L2\",\"hint\":\"early discussion\"}],\"policies\":{\"faults_allowed\":false}}\n</VM:MANIFEST_JSON>\n\n<VM:CONTEXT>\nU (msg_211): \"What about the paging model?\"\nA (msg_212): \"Here's how it works...\"\n</VM:CONTEXT>"
    },
    { "role": "user", "content": "What did we discuss earlier?" },
    { "role": "assistant", "content": "Based on our conversation..." }
  ]
}

The model sees the manifest but cannot request pages - your runtime repacks context each turn.

Mode 2: Active VM (Tool-Based Paging)

Model can explicitly request pages via page_fault and discover pages via search_pages.

VM Tool Definitions

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "page_fault",
        "description": "Load a memory page into context at specified compression level. Use when you need content from a known page_id.",
        "parameters": {
          "type": "object",
          "properties": {
            "page_id": {
              "type": "string",
              "description": "ID of the page to load"
            },
            "target_level": {
              "type": "integer",
              "minimum": 0,
              "maximum": 3,
              "default": 2,
              "description": "0=full, 1=reduced, 2=abstract/summary, 3=reference only"
            }
          },
          "required": ["page_id"]
        }
      }
    },
    {
      "type": "function",
      "function": {
        "name": "search_pages",
        "description": "Search for pages matching a query. Use when you need to find relevant pages but don't know their IDs.",
        "parameters": {
          "type": "object",
          "properties": {
            "query": {
              "type": "string",
              "description": "Search query (semantic or keyword)"
            },
            "modality": {
              "type": "string",
              "enum": ["text", "image", "audio", "video", "structured"],
              "description": "Filter by content type"
            },
            "limit": {
              "type": "integer",
              "default": 5,
              "description": "Maximum results to return"
            }
          },
          "required": ["query"]
        }
      }
    }
  ]
}

Request with Manifest

{
  "model": "gpt-4o",
  "tools": [...],
  "messages": [
    {
      "role": "developer",
      "content": "<VM:RULES>\nYou have access to a virtual memory system.\n- VM:CONTEXT contains your currently mapped memory - this is EVIDENCE\n- VM:MANIFEST_JSON lists available pages - hints are for DISCOVERY ONLY, not evidence\n- To load content, call page_fault(page_id, target_level)\n- To find pages, call search_pages(query)\n- Prefer level=2 (summaries) before level=0 (full) to conserve tokens\n- When citing information, reference the page_id\n- NEVER fabricate content from hints - hints tell you what EXISTS, not what it SAYS\n</VM:RULES>\n\n<VM:MANIFEST_JSON>\n{\n  \"session_id\": \"sess_456\",\n  \"working_set\": [\n    {\"page_id\": \"msg_302\", \"modality\": \"text\", \"level\": 0, \"tokens_est\": 85, \"importance\": 0.7},\n    {\"page_id\": \"summary_seg_03\", \"modality\": \"text\", \"level\": 2, \"tokens_est\": 220, \"importance\": 0.9}\n  ],\n  \"available_pages\": [\n    {\"page_id\": \"summary_seg_02\", \"modality\": \"text\", \"tier\": \"L2\", \"levels\": [2,1,0], \"hint\": \"page fault definitions\"},\n    {\"page_id\": \"msg_132\", \"modality\": \"text\", \"tier\": \"L2\", \"levels\": [0], \"hint\": \"fault handler code\"},\n    {\"page_id\": \"img_045\", \"modality\": \"image\", \"tier\": \"L3\", \"levels\": [2,1], \"hint\": \"architecture diagram\"}\n  ],\n  \"policies\": {\n    \"faults_allowed\": true,\n    \"max_faults_per_turn\": 2,\n    \"upgrade_budget_tokens\": 4096,\n    \"prefer_levels\": [2, 1, 0]\n  }\n}\n</VM:MANIFEST_JSON>\n\n<VM:CONTEXT>\nU (msg_302): \"it should be compatible with openai format\"\nA (msg_303): \"Yes, here's how...\"\nS (summary_seg_03): \"This segment covered: VM protocol design, Chat Completions mapping, tool-based paging.\"\n</VM:CONTEXT>"
    },
    { "role": "user", "content": "Show me that architecture diagram you mentioned" }
  ]
}

Model Issues Page Fault

{
  "role": "assistant",
  "content": null,
  "tool_calls": [
    {
      "id": "call_abc123",
      "type": "function",
      "function": {
        "name": "page_fault",
        "arguments": "{\"page_id\":\"img_045\",\"target_level\":2}"
      }
    }
  ]
}

Canonical Tool Result Envelope

All VM tool results use a consistent envelope for downstream processing:

{
  "role": "tool",
  "tool_call_id": "call_abc123",
  "content": "{\"page\":{\"page_id\":\"img_045\",\"modality\":\"image\",\"level\":2,\"tier\":\"L1\",\"content\":{\"caption\":\"Architecture diagram showing L0-L4 storage tiers with bidirectional arrows indicating page promotion and eviction flows.\",\"url\":\"https://artifacts.example.com/img_045.png\"},\"meta\":{\"dimensions\":[1200,800],\"mime_type\":\"image/png\",\"size_bytes\":245000,\"source_tier\":\"L3\"}},\"effects\":{\"promoted_to_working_set\":true,\"tokens_est\":95,\"evictions\":null}}"
}

Envelope Schema

interface VMToolResult {
  page: {
    page_id: string;
    modality: "text" | "image" | "audio" | "video" | "structured";
    level: 0 | 1 | 2 | 3;
    tier: "L0" | "L1" | "L2" | "L3" | "L4";
    content: ModalityContent;  // Varies by modality
    meta: {
      mime_type?: string;
      size_bytes?: number;
      source_tier: string;
      // Modality-specific
      dimensions?: [number, number];      // image/video
      duration_seconds?: number;          // audio/video
      word_count?: number;                // text
    };
  };
  effects: {
    promoted_to_working_set: boolean;
    tokens_est: number;
    evictions?: string[];  // page_ids evicted to make room
  };
}

Content by Modality

// Text (level 0-3)
{ text: string }

// Image
{ caption: string; url?: string; base64?: string; embedding?: number[] }

// Audio
{ transcript: string; timestamps?: Array<{time: number, text: string}>; duration_seconds: number }

// Video
{ scenes: Array<{timestamp: number, description: string, keyframe_url?: string}>; transcript?: string }

// Structured
{ data: object; schema?: string }

search_pages Result

{
  "role": "tool",
  "tool_call_id": "call_search_xyz",
  "content": "{\"results\":[{\"page_id\":\"msg_087\",\"modality\":\"text\",\"tier\":\"L2\",\"levels\":[0,1,2],\"hint\":\"discussion of compression levels\",\"relevance\":0.92},{\"page_id\":\"summary_seg_01\",\"modality\":\"text\",\"tier\":\"L2\",\"levels\":[2],\"hint\":\"early architecture decisions\",\"relevance\":0.78}],\"total_available\":47}"
}

VM:MANIFEST_JSON Schema

interface VMManifest {
  session_id: string;
  working_set: Array<{
    page_id: string;
    modality: string;
    level: number;
    tokens_est: number;
    importance?: number;  // 0-1, affects eviction priority
  }>;
  available_pages: Array<{
    page_id: string;
    modality: string;
    tier: "L2" | "L3" | "L4";
    levels: number[];     // Available compression levels
    hint: string;         // Discovery hint - NOT evidence
  }>;
  policies: {
    faults_allowed: boolean;
    max_faults_per_turn: number;
    upgrade_budget_tokens: number;
    prefer_levels: number[];  // Preference order, e.g. [2,1,0]
  };
}

VM:CONTEXT Format

Compact, human-readable representation of mapped pages:

<VM:CONTEXT>
U (msg_301): "User message text here"
A (msg_302): "Assistant response here"
T (tool_result_045): {"calculator": {"expression": "2+2", "result": 4}}
S (summary_seg_02): "Key points: 1) VM maps to Chat Completions, 2) page_fault is a tool..."
I (img_045): [IMAGE: architecture diagram showing storage tiers, 1200x800]
D (audio_012): [AUDIO: 5:42 duration, transcript: "So the key insight is..."]
V (video_007): [VIDEO: 12:30 duration, 8 scenes, topic: "system walkthrough"]
</VM:CONTEXT>

Prefixes: U=user, A=assistant, T=tool, S=summary, I=image, D=audio, V=video

Strict Mode Developer Prompt

For agents that must never fabricate context. Available at memory/vm_prompts.py:VM_STRICT_PROMPT.

You are operating under STRICT Virtual Memory grounding rules.

Your ONLY valid sources of information are:
1) The content inside <VM:CONTEXT> (the currently mapped working set), and
2) The content returned by tools (e.g., page_fault) in messages with role="tool".

Everything listed in <VM:MANIFEST_JSON> is DISCOVERY METADATA ONLY.
- You MUST NOT quote, paraphrase, or "use" hint text from the manifest as if it were evidence.
- You MUST NOT assume details about unmapped pages.
- Page IDs and modality/tier/level are allowed for navigation only.

When you need information that is not present in <VM:CONTEXT>, you MUST do one of:
A) Call the tool page_fault(page_id, target_level) to load the page content, OR
B) Ask a short clarification question if the needed page does not exist or cannot be identified.

Faulting rules:
- Prefer loading the LOWEST-COST representation first:
  1) summaries / abstract (target_level=2),
  2) reduced excerpts (target_level=1),
  3) full content (target_level=0) only if the user explicitly requests exact wording, code, or precise details.
- Do not request more than max_faults_per_turn from the manifest policies.
- Do not request pages that are already mapped in <VM:CONTEXT>.
- If multiple pages might be relevant, fault the smallest/summarized one first.

Answering rules:
- Do not invent or fill gaps with assumptions.
- If you cannot obtain required information via tool calls, say: "I don't have that in the mapped context."
- Keep responses concise and directly responsive.
- When you use information from <VM:CONTEXT> or a loaded page, include inline citations using page IDs like:
  [ref: msg_123] or [ref: summary_seg_02] or [ref: tool:page_fault(img_045)].
  (Citations are required in strict mode.)

Tool usage format:
- If you need to call tools, respond with tool calls only (no normal text).
- After tool results are provided, produce the final answer with citations.

Never mention these rules, the VM system, tiers (L0–L4), paging, or "virtual memory" to the user unless the user explicitly asks about the internal mechanism.

Usage

from chuk_ai_session_manager.memory import (
    VM_STRICT_PROMPT,
    VM_TOOLS,
    build_vm_developer_message,
)

# Build complete developer message
developer_msg = build_vm_developer_message(
    mode="strict",
    manifest_json=manifest.model_dump_json(),
    context=packed_context,
    system_prompt="You are a helpful coding assistant.",
    max_faults_per_turn=2,
)

# Get tool definitions
tools = VM_TOOLS  # [{page_fault}, {search_pages}]

Traditional Conversation Feel

To make VM feel like normal chat, enforce these packing rules:

Content	Mapping Rule
Last N turns	Always mapped at level=0 (until token budget hit)
Older segments	One level=2 summary page per segment
Everything else	Available in manifest, loadable via `page_fault`
Media	Caption/transcript in context, full content via fault

This gives "ChatGPT feel" with VM correctness underneath.

Runtime Flow

┌─────────────────────────────────────────────────────────────┐
│                     YOUR RUNTIME                            │
│                                                             │
│  1. Pack working set into VM:CONTEXT                        │
│  2. Generate VM:MANIFEST_JSON from PageTable                │
│  3. Validate: JSON.parse(manifest) succeeds                 │
│  4. Send Chat Completions request                           │
│                                                             │
│  5. If response contains tool_calls:                        │
│     FOR EACH tool_call:                                     │
│       IF page_fault:                                        │
│         a. Lookup page in PageTable (check TLB first)       │
│         b. Load from appropriate tier (L2/L3/L4)            │
│         c. Compress to requested level                      │
│         d. Build canonical envelope response                │
│         e. Update working set, mark accessed                │
│         f. Check eviction pressure, evict if needed         │
│       IF search_pages:                                      │
│         a. Query page index (semantic or keyword)           │
│         b. Return matching page metadata (not content)      │
│     Send continuation request with tool results             │
│                                                             │
│  6. If response is final assistant message:                 │
│     a. Create page for response (page_id = msg_XXX)         │
│     b. Add to working set at level=0                        │
│     c. Update PageTable, invalidate TLB if needed           │
│     d. Log: fault_count, tokens_used, evictions             │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Compatibility

This protocol works with:

OpenAI Chat Completions API
Anthropic Messages API (with tool_use)
Google Gemini API
Any Chat Completions-compatible endpoint
Local models via OpenAI-compatible servers (vLLM, Ollama, etc.)

No vendor lock-in, no custom extensions.

Integration Points

With Existing Session Manager

class SessionManager:
    memory_manager: MemoryManager  # NEW

    async def user_says(self, content: Union[str, bytes, MemoryPage], ...):
        # Create page for content
        page = await self.memory_manager.create_page(content, modality="text")
        # Add to working set
        await self.memory_manager.add_to_working_set(page)
        # Existing event tracking
        await self._add_message_event(...)

    async def attach_media(self, data: bytes, mime: str, ...):
        # Create media page
        page = await self.memory_manager.create_page(data, modality=detect_modality(mime))
        # Compress for context
        compressed = await self.memory_manager.compress(page, level=1)
        # Store full version
        await self.memory_manager.store(page, tier="L3")

With Infinite Conversation

class InfiniteConversationManager:
    async def _create_new_segment(self):
        # Existing segment creation
        new_session = await self._create_child_session()

        # NEW: Page out old segment
        old_pages = await self.memory_manager.get_pages_for_session(self.session_id)
        for page in old_pages:
            await self.memory_manager.evict(page, target_tier="L2")

        # NEW: Create summary page
        summary_page = await self.memory_manager.compress_session(self.session_id, level=2)
        await self.memory_manager.add_to_working_set(summary_page)

With chuk-artifacts

# Configuration
ARTIFACT_PROVIDER=vfs-s3  # For L4 cold storage
ARTIFACT_MEMORY_PROVIDER=vfs-memory  # For L3 hot artifacts

# Page storage uses artifacts
page_artifact_id = await artifacts_store.store(
    data=page.serialize(),
    mime="application/x-memory-page",
    scope=StorageScope.SESSION,
    metadata={
        "page_id": page.page_id,
        "modality": page.modality,
        "compression_level": page.compression_level,
    }
)

Configuration

class MemoryConfig:
    # Tier sizes
    l0_max_tokens: int = 128_000  # Context window
    l1_max_pages: int = 100       # Hot cache
    l2_max_bytes: int = 100_000_000  # 100MB session storage

    # Eviction
    eviction_policy: str = "importance_weighted_lru"
    eviction_threshold: float = 0.8  # Trigger at 80% capacity

    # Compression
    auto_compress_after_seconds: int = 300  # 5 minutes
    default_compression_level: int = 1

    # Prefetch
    prefetch_enabled: bool = True
    prefetch_depth: int = 2  # Segments ahead

    # Storage
    artifacts_provider: str = "vfs-filesystem"
    cold_storage_provider: str = "vfs-s3"

    # Per-modality settings
    image_max_inline_size: int = 1_000_000  # 1MB
    audio_auto_transcribe: bool = True
    video_extract_keyframes: bool = True

Success Metrics

Core Performance Metrics

Metric	Target	Description
Fault rate	< 5%	Percentage of accesses requiring tier promotion
Compression ratio	> 10:1	Average reduction from L0 to L2
Context utilization	> 90%	How well we fill the context window
Prefetch hit rate	> 60%	Prefetched pages actually used
P95 fault latency	< 500ms	Time to resolve page fault
Memory overhead	< 10%	Metadata vs. content size

User Experience Metrics (Most Important)

These tell you whether the system "feels good" to users:

Metric	Target	Description
Recall success rate	> 95%	When user asks "what did we say about X?", answer correctly without correction
Thrash index	< 0.5	Faults per turn after the first fault in same topic window
Effective tokens delivered	> 60%	Tokens in context that are actually referenced in the answer

class UserExperienceMetrics:
    """
    Metrics that correlate with user satisfaction.
    """

    def recall_success_rate(self, session_id: str) -> float:
        """
        Track: user asks recall question → assistant answers → user confirms/corrects
        Success = no correction needed
        """
        recall_attempts = self.get_recall_attempts(session_id)
        successes = sum(1 for r in recall_attempts if not r.user_corrected)
        return successes / len(recall_attempts) if recall_attempts else 1.0

    def thrash_index(self, session_id: str, window_turns: int = 5) -> float:
        """
        Faults after first fault in a topic window.
        Low = stable working set. High = constantly missing what we need.

        Thrash index = (total_faults - unique_first_faults) / turns
        """
        faults = self.get_faults_in_window(session_id, window_turns)
        first_faults = set()
        thrash_faults = 0
        for fault in faults:
            topic = self.get_topic(fault.page_id)
            if topic not in first_faults:
                first_faults.add(topic)
            else:
                thrash_faults += 1
        return thrash_faults / window_turns

    def effective_tokens_ratio(self, turn_id: str) -> float:
        """
        What fraction of context tokens actually contributed to the answer?
        Track by seeing which page_ids are cited or clearly used.
        """
        context_tokens = self.get_context_tokens(turn_id)
        referenced_tokens = self.get_referenced_tokens(turn_id)
        return referenced_tokens / context_tokens if context_tokens else 0.0

Fault Reason Breakdown

Track why faults happen to tune policies:

Reason	Expected %	Red Flag If
`user_requested_recall`	40-60%	< 20% (model faulting too speculatively)
`resolve_reference`	20-40%	> 60% (poor working set selection)
`tool_prereq`	10-20%	> 40% (tools not getting needed pages)
`speculative`	< 10%	> 20% (wasting budget on guesses)

Timeline Overview

Phase	Version	Focus	Status
1	v0.8	Foundations — MemoryPage, PageTable, TLB	SHIPPED
2	v0.8	Working Set — PinnedSet, AntiThrash, TokenBudget	SHIPPED
3	v0.8	Page Faults — FaultHandler, ArtifactsBridge, FaultPolicy	SHIPPED
11	v0.9	MemoryManager orchestrator + demand paging strategy	SHIPPED
4	v0.10	Eviction — Generic EvictionPolicy, modality compressors	SHIPPED
12	v0.10	Closed-loop learning — procedural memory + guards feedback	NEXT
13	v0.10	Storage query layer — page index beyond K/V	NEXT
5	v0.11	Multi-Modal — Real modality handlers, ABI negotiation	Partial
6	v0.11	Prefetch — ML prediction, background loading	Partial
7	v0.11	Memory Pressure — Active monitoring, adaptive thresholds	Planned
8	v0.12	Coherency — Versioning, time-travel, full event sourcing	Planned
9	v0.12	Tools as Memory Writers — First-class tool memory interface	Planned
14	v0.12	Agent Planning & Skills — mcp-cli foundation	Planned
10	v1.0+	Advanced — Shared Memory, COW, GC, NUMA	Planned

Minimum Viable v0.8 Scope — SHIPPED

v0.8 exceeded its planned scope. Originally targeted core models + page table + basic faults. Shipped all of Phases 1-3 including mutation log, prefetcher, pack cache, and full fault policy system.

What shipped to prove VM works. Text-only, single-session, synchronous faults.

Ship First (v0.8)

memory/
├── models.py          # MemoryPage, PageTableEntry, CompressionLevel
├── page_table.py      # PageTable with dirty tracking
├── tlb.py             # PageTLB (512 entries, LRU)
├── working_set.py     # WorkingSetManager + TokenBudget
├── context_packer.py  # Pack working set → VM:CONTEXT (text only)
├── manifest.py        # Generate VM:MANIFEST_JSON from PageTable
├── fault_handler.py   # Synchronous PageFaultHandler
└── artifacts_bridge.py # Store/load pages via chuk-artifacts

Core Models (v0.8)

class MemoryPage(BaseModel):
    page_id: str
    modality: Modality = Modality.TEXT
    storage_tier: StorageTier = StorageTier.L0
    compression_level: CompressionLevel = CompressionLevel.FULL

    # Page type from day one (critical for eviction/compression)
    page_type: PageType = PageType.TRANSCRIPT

    # Provenance from day one (linked representations)
    provenance: List[str] = []  # page_ids this derives from
    represents: Optional[str] = None  # page_id this is a compressed version of

    content: Optional[str] = None
    artifact_id: Optional[str] = None
    size_bytes: int
    tokens_est: int
    created_at: datetime
    last_accessed: datetime
    access_count: int = 0
    importance: float = 0.5
    dirty: bool = False

class PageTableEntry(BaseModel):
    page_id: str
    tier: StorageTier
    page_type: PageType = PageType.TRANSCRIPT
    artifact_id: Optional[str]
    compression_level: CompressionLevel = CompressionLevel.FULL
    dirty: bool = False
    last_accessed: datetime
    access_count: int = 0

Tools (v0.8)

# Just these two tools
page_fault(page_id: str, target_level: int = 2) -> VMToolResult
search_pages(query: str, limit: int = 5) -> SearchResult

Metrics to Track (v0.8)

class VMMetrics:
    faults_total: int
    faults_per_turn: float
    tlb_hits: int
    tlb_misses: int
    evictions_total: int
    tokens_in_working_set: int
    tokens_available: int
    pages_by_tier: Dict[str, int]

Integration (v0.8)

class SessionManager:
    # Add these
    memory: MemoryManager

    async def user_says(self, content: str, ...):
        page = self.memory.create_page(content, modality="text")
        await self.memory.add_to_working_set(page)
        # ... existing logic

    def get_vm_developer_message(self) -> str:
        """Generate the developer message with VM:RULES, VM:MANIFEST_JSON, VM:CONTEXT"""

Defer to Later

Feature	Reason to Defer
StreamingFaultHandler	Hard, model-dependent, not needed to prove concept
Multi-modal (image/audio/video)	Start with text, add modalities incrementally
Shared memory regions	Complex access control, not needed for single-session
NUMA / locality	Only matters at scale with distributed storage
Copy-on-write	Optimization, not core functionality
Garbage collection	Can manually clean up initially
Full event sourcing	Lite version in v0.9, full time-travel in v0.15
ML-based prefetch	Simple heuristics in v0.9, ML prediction in v0.13

v0.8 Success Criteria — ALL MET

Metric	Target	Result
Tests passing	100%	1704 tests passing
Fault handling works	Model can page_fault, get content	Complete — async, protocol-extensible
TLB hit rate	> 80% for repeated accesses	Complete — LRU with stats tracking
Manifest generation	Valid JSON, parseable	Complete — ManifestBuilder with policies
Context packing	Stays under token budget	Complete — cross-modal formatters
Integration	Works with existing SessionManager	Complete — MemoryManager orchestrator + SessionManager enable_vm (v0.9)

v0.8 Non-Goals

Perfect eviction policy (LRU is fine)
Full compression (stub summarizer OK)
Multi-modal anything
Distributed/shared state
Sub-100ms fault latency

Open Questions

Resolved

Real-time streaming: How to page in/out during streaming responses? → DeferredPage with speculative placeholders (Phase 3)
Cross-session sharing: How to handle shared pages across sessions? → SharedMemoryRegion with scope-based access control (Phase 10)
Phase ordering: Should we build the orchestrator first or the primitives? → Primitives first (Phases 1-3 shipped in v0.8), orchestrator next (Phase 11/v0.9)
How much to ship in v0.8?: Core models only or full fault handling? → Shipped everything through fault handling, mutation log, prefetcher, pack cache. Implementation ran ahead of planned phases.

Active

Storage query semantics (NEW — see Phase 13): chuk-sessions is K/V. The VM layer needs queries by PageType, provenance chain, importance score, content hints. Build a lightweight index layer on top of K/V or push toward a document store? Current recommendation: lightweight PageIndex on K/V for v0.10, evaluate doc store if query complexity grows.
Demand paging initiation (NEW — see Phase 11): Who triggers page faults? Model-initiated (current page_fault tool), runtime pre-pass (scan user message for recall signals), or hybrid?
Embedding storage: Store vectors in artifacts or separate vector DB?
- Option A: Artifacts (unified, simpler)
- Option B: Dedicated vector DB (faster similarity search)
- Option C: Hybrid (embeddings in vector DB, linked to artifact pages)
Compression callbacks: Use LLM for summarization or local models?
- LLM: Higher quality, higher latency, cost
- Local: Fast, cheap, lower quality
- Hybrid: Local for L1, LLM for L2+
Cost optimization: Balance storage vs. compute (recompression)?
- Store all compression levels (storage heavy)
- Recompute on demand (compute heavy)
- Adaptive based on access patterns
Multi-model memory sharing: How do different models share memory?
- Same Memory ABI, different token budgets?
- Model-specific compression preferences?
- Handoff protocol between models?
Memory as capability: Should tools declare memory requirements?
- "This tool needs 10K tokens of context"
- "This tool creates large pages"
- Memory budgeting in tool selection
Closed-loop learning scope (NEW — see Phase 12): How aggressive should skill promotion be? Threshold for promoting a tool sequence to a named skill? Should guard-triggered avoidance patterns be ephemeral (session-scoped) or persistent (cross-session)?

North Star Acceptance Test

One direct acceptance test that proves the whole system works:

A 200+ turn conversation where the user repeatedly says "as we decided earlier…" across 5 topics, and the assistant reliably recalls the right decision with page_id citations, while keeping L0 under budget and faulting <2 pages per turn.

This test instantly reveals whether working-set logic, pinning, anti-thrash, and claim extraction are doing their job.

Test Scenario

async def test_north_star_vm_acceptance():
    """
    200+ turn conversation across 5 topics with reliable recall.
    """
    sm = SessionManager(
        system_prompt="You are a project planning assistant.",
        vm_config=VMConfig(
            max_l0_tokens=32_000,  # Tight budget to force paging
            faults_allowed=True,
            max_faults_per_turn=2,
        )
    )

    topics = [
        "database_choice",      # "Let's use PostgreSQL"
        "api_framework",        # "We'll go with FastAPI"
        "frontend_stack",       # "React with TypeScript"
        "deployment_strategy",  # "Kubernetes on GCP"
        "testing_approach",     # "Pytest with 80% coverage"
    ]

    # Phase 1: Make decisions (turns 1-50)
    for i, topic in enumerate(topics):
        await sm.user_says(f"For {topic}, what should we use?")
        await sm.ai_responds(f"I recommend X for {topic}...")
        await sm.user_says(f"Agreed, let's go with X for {topic}")
        # This should create a claim page

    # Phase 2: Extended discussion (turns 51-150)
    # Talk about unrelated things to push decisions out of L0
    for i in range(100):
        await sm.user_says(f"Tell me about implementation detail {i}")
        await sm.ai_responds(f"Here's detail {i}...")

    # Phase 3: Recall challenges (turns 151-200)
    recall_successes = 0
    total_faults = 0

    for topic in topics:
        await sm.user_says(f"As we decided earlier, what's our {topic}?")
        response = await sm.get_last_response()

        # Check: correct recall with citation
        assert f"[ref: claim_{topic}" in response.content or correct_decision in response.content
        recall_successes += 1

        # Check: fault count for this turn
        turn_faults = sm.memory.get_faults_this_turn()
        assert turn_faults <= 2, f"Too many faults: {turn_faults}"
        total_faults += turn_faults

    # Assertions
    assert recall_successes == 5, "Should recall all 5 decisions"
    assert total_faults <= 10, f"Should fault <= 2 per recall turn, got {total_faults}"
    assert sm.memory.get_l0_tokens() <= 32_000, "L0 budget exceeded"

    # Thrash index should be low
    thrash = sm.memory.metrics.thrash_index()
    assert thrash < 0.5, f"Thrash index too high: {thrash}"

    # Recall success rate should be high
    recall_rate = sm.memory.metrics.recall_success_rate()
    assert recall_rate >= 0.95, f"Recall rate too low: {recall_rate}"

What This Tests

Capability	How It's Tested
Claim extraction	Decisions become token-cheap claim pages
Pinning	Claims stay accessible despite 100 turns of other content
Anti-thrash	Same topics don't repeatedly fault
Working set	L0 stays under budget across 200 turns
Fault budget	Never exceeds 2 faults per turn
Provenance	Citations link back to source pages
UX metrics	Recall success and thrash index within targets

If this test passes, the VM system "feels good" end-to-end.

Integration with chuk-sessions

The VM system is designed to layer cleanly on top of the existing chuk-sessions infrastructure. This section details how the storage tiers map to existing components.

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                   Application Layer                          │
│               (Chat, Agents, Tool Calls)                     │
└──────────────────────────┬──────────────────────────────────┘
                           │
┌──────────────────────────▼──────────────────────────────────┐
│                    SessionManager                            │
│  ├─ user_says() / ai_responds()                              │
│  ├─ get_messages_for_llm() ◄─── VM context built here        │
│  └─ memory: MemoryManager (NEW)                              │
└──────────────────────────┬──────────────────────────────────┘
                           │
         ┌─────────────────┴─────────────────┐
         │                                   │
┌────────▼────────────┐          ┌───────────▼───────────┐
│  MemoryManager      │          │  Session (L2)         │
│  ├─ PageTable       │          │  ├─ events[]          │◄── Source of truth
│  ├─ WorkingSet      │ derives  │  ├─ state{}           │◄── L1 cache
│  ├─ TLB             │  from    │  └─ parent/children   │◄── Chain links
│  └─ ContextPacker   │──────────│                       │
└─────────────────────┘          └───────────┬───────────┘
                                             │
                                 ┌───────────▼───────────┐
                                 │  chuk-sessions        │
                                 │  (Redis/Memory)       │
                                 └───────────────────────┘

Tier Mapping to Existing Infrastructure

VM Tier	Maps To	Already Exists	Description
L0	`WorkingSetManager`	New	In-memory, current context window
L1	`Session.state{}`	✅ Yes	Derived content (summaries, claims, cache)
L2	`Session.events[]`	✅ Yes	Append-only event log (source of truth)
L3	`chuk-artifacts`	✅ Yes	Artifact storage (vfs-filesystem)
L4	`chuk-artifacts`	✅ Yes	Cold storage (vfs-s3)

L2: Session.events (Source of Truth)

The existing Session.events list IS the L2 authoritative tier:

class Session(BaseModel):
    events: List[SessionEvent]  # Append-only, immutable

    async def add_event(self, event: SessionEvent):
        self.events.append(event)  # Never mutated

Key properties:

✅ Append-only (no mutation)
✅ Immutable once written
✅ Persistent via chuk-sessions
✅ Already tracks tokens, timestamps, source

VM Strategy: Don't duplicate this. Reference it.

L1: Session.state (Derived Cache)

The existing Session.state dict stores derived content:

class Session(BaseModel):
    state: Dict[str, Any]  # Mutable, derived from events

# VM uses this for:
session.state["vm:summaries"] = {
    "summary_seg_01": {"content": "...", "provenance": ["msg_1", "msg_2"]},
}
session.state["vm:claims"] = {
    "claim_db_choice": {"content": "Use PostgreSQL", "importance": 0.95},
}
session.state["vm:pack_cache"] = {
    "hash_abc123": {"vm_context": "...", "tokens": 5000},
}

Key insight: L1 can be rebuilt from L2 if lost.

L0: WorkingSetManager (New)

Active context window, managed in-memory:

# What goes into the LLM prompt
l0_page_ids = working_set.get_l0_page_ids()

# Maps to pages derived from Session.events
# Plus summaries/claims from Session.state

Session Chain = Page Hierarchy

Parent-child relationships already support multi-segment conversations:

# Session navigation (already exists)
ancestors = await session.ancestors()      # Walk to root
descendants = await session.descendants()  # Walk to leaves

# VM interpretation:
# - Root session = oldest content (L3/L4)
# - Current session = hot content (L0/L1)
# - Parent sessions = warm content (L2)

SessionManager Integration

class SessionManager:
    def __init__(
        self,
        # ... existing params ...
        enable_vm: bool = False,
        vm_config: Optional[WorkingSetConfig] = None,
    ):
        # NEW: VM memory layer (optional)
        self.memory: Optional[MemoryManager] = None
        if enable_vm:
            self.memory = MemoryManager(
                session_id=self._session_id,
                config=vm_config or WorkingSetConfig(max_l0_tokens=32_000),
            )

    async def user_says(self, content: str, **metadata):
        # Existing: add event to session (L2)
        event = await SessionEvent.create_with_tokens(...)
        await self._session.add_event(event)

        # NEW: Create page and add to working set (L0)
        if self.memory:
            page = MemoryPage(
                page_id=f"msg_{event.id}",
                page_type=PageType.TRANSCRIPT,
                content=content,
            )
            self.memory.page_table.register(page)
            self.memory.working_set.add_to_l0(page)

    async def get_messages_for_llm(self, **kwargs):
        if self.memory:
            # Use VM context packer
            return self.memory.build_llm_context()
        else:
            # Existing behavior (no VM)
            return await self._build_traditional_prompt()

Infinite Context Integration

When a session segment is created:

async def _create_new_segment(self):
    # Existing: Create child session
    new_session = await Session.create(parent_id=self._session.id)

    # NEW: Evict old pages from L0 to L2
    if self.memory:
        for page_id in self.memory.working_set.get_l0_page_ids():
            page = self.memory.get_page(page_id)
            # Store to artifacts (L3)
            artifact_id = await self.memory.storage.store_page(page)
            # Update location in page table
            self.memory.page_table.update_location(
                page_id, tier=StorageTier.L2, artifact_id=artifact_id
            )
            # Remove from working set
            self.memory.working_set.remove_from_l0(page_id, page)

        # Create summary page for the segment
        summary = MemoryPage(
            page_id=f"summary_{self._session.id}",
            page_type=PageType.SUMMARY,
            content=await self._generate_summary(),
            provenance=[p.page_id for p in old_pages],
        )
        self.memory.page_table.register(summary)
        self.memory.working_set.add_to_l0(summary)

Event-to-Page Mapping

SessionEvents map directly to MemoryPages:

def event_to_page(event: SessionEvent) -> MemoryPage:
    """Convert a SessionEvent to a MemoryPage."""
    return MemoryPage(
        page_id=f"msg_{event.id}",
        page_type=PageType.TRANSCRIPT,
        modality=Modality.TEXT,
        content=event.content,
        storage_tier=StorageTier.L0,
        size_tokens=event.token_usage.total_tokens if event.token_usage else None,
        created_at=event.timestamp,
        metadata={
            "event_id": event.id,
            "source": event.source.value,
            "model": event.token_usage.model if event.token_usage else None,
        },
    )

VM Metadata in Session

Store VM configuration and state:

# In Session.metadata.properties
session.metadata.properties["vm:mode"] = "strict"
session.metadata.properties["vm:max_l0_tokens"] = 32000
session.metadata.properties["vm:fault_limit"] = 3

# In Session.state (persisted, derived)
session.state["vm:page_locations"] = {
    "msg_001": {"tier": "L2", "artifact_id": "art_xyz"},
    "claim_db": {"tier": "L1", "pinned": True},
}

What Already Works

Component	Status	Notes
Session.events (L2)	✅ Exists	Immutable event log
Session.state (L1)	✅ Exists	Derived content cache
Parent-child chain	✅ Exists	Segment navigation
Token tracking	✅ Exists	Per-event token counts
ChukSessionsStore	✅ Exists	Persistence layer
MemoryPage, PageTableEntry, all enums	✅ Done	v0.8 — Pydantic models, no magic strings
PageTable (dirty/tier/modality tracking)	✅ Done	v0.8 — Full index management
PageTLB + TLBWithPageTable	✅ Done	v0.8 — LRU cache with hit/miss stats
WorkingSetManager + PinnedSet + AntiThrash	✅ Done	v0.8 — Importance-weighted eviction scoring
ContextPacker (cross-modal formatters)	✅ Done	v0.8 — Text, image, audio, video, structured
ManifestBuilder + VMManifest	✅ Done	v0.8 — JSON manifest with policies
PageFaultHandler + PageSearchHandler	✅ Done	v0.8 — Async, protocol-extensible
FaultPolicy + FaultReason + FaultRecord	✅ Done	v0.8 — Token budget, confidence threshold, intent tracking
ArtifactsBridge (chuk-artifacts + fallback)	✅ Done	v0.8 — In-memory fallback when artifacts unavailable
MutationLogLite (append-only, replay)	✅ Done	v0.8 — Context snapshots, per-turn replay
SimplePrefetcher (heuristic-based)	✅ Done	v0.8 — Summary + claims + tool prereqs
ContextPackCache (LRU)	✅ Done	v0.8 — Session-aware invalidation
VM prompts (strict/relaxed/passive)	✅ Done	v0.8 — Tool definitions + mode-specific rules
MemoryABI model	✅ Done	v0.8 — Model in models.py, serialization ready
UserExperienceMetrics	✅ Done	v0.8 — Recall success rate, thrash index

What Needs Building

Priority	Component	Description	Phase
P0	~~`MemoryManager`~~	~~Orchestrator tying PageTable, WorkingSet, FaultHandler, Packer together~~	~~v0.9~~ ✅
P0	~~`SessionManager.memory`~~	~~Optional VM integration property on SessionManager~~	~~v0.9~~ ✅
P0	~~Event↔Page mapper~~	~~Convert SessionEvents to MemoryPages automatically~~	~~v0.9~~ ✅
P0	~~Segmentation hook~~	~~Trigger eviction on new segment creation~~	~~v0.9~~ ✅
P0	~~VM-aware prompt builder~~	~~Use `VM:CONTEXT` when `enable_vm=True`~~	~~v0.9~~ ✅
P0	~~`DemandPagingPrePass`~~	~~Scan user messages for recall signals, prefetch~~	~~v0.9~~ ✅
P1	`GuardFeedbackIntegration`	Wire guard triggers into procedural memory	v0.10
P1	`PageIndex`	Lightweight query index over page metadata	v0.10
P1	~~`EvictionPolicy` protocol~~	~~Swappable eviction strategies (LRU, importance-weighted, etc.)~~	~~v0.10~~ ✅
P1	`SkillPromoter`	Detect and promote stable tool patterns	v0.10
P2	`StreamingFaultHandler`	Non-blocking faults during streaming responses	v0.11
P2	Modality handlers	Real image/audio/video LLM context building	v0.11
P2	`MemoryPressureMonitor`	Active pressure tracking with adaptive thresholds	v0.11
P3	`PlanPage` / `SkillPage`	Plan & skill models for mcp-cli integration	v0.12
P3	Full event sourcing	Time-travel, version vectors, deterministic replay	v0.12

Configuration Example

from chuk_ai_session_manager import SessionManager
from chuk_ai_session_manager.memory import WorkingSetConfig, VMMode

# Create session with VM enabled
sm = SessionManager(
    session_id="my_session",
    system_prompt="You are a helpful assistant.",
    infinite_context=True,
    enable_vm=True,  # NEW
    vm_config=WorkingSetConfig(
        max_l0_tokens=32_000,
        max_l1_pages=100,
        eviction_threshold=0.85,
    ),
    vm_mode=VMMode.STRICT,  # Enforce grounding rules
)

# Use normally - VM handles context automatically
await sm.user_says("Let's use PostgreSQL for the database")
await sm.ai_responds("Great choice! PostgreSQL is excellent for...")

# Later, when context pressure builds:
# - Old transcripts evicted to L2/L3
# - Claims stay pinned in L0/L1
# - Model can fault in old content as needed

Why This Architecture Works

L2 is immutable - Session.events never mutates, so no coherency issues
L1 is derived - Session.state can be rebuilt from L2 if lost
L0 is ephemeral - WorkingSetManager is in-memory only
Existing chain navigation - Parent-child already supports multi-segment
Token tracking exists - Per-event counts feed directly into TokenBudget
Storage abstraction exists - ChukSessionsStore handles Redis/memory

References

OS Virtual Memory: Page tables, TLB, working sets, page replacement algorithms
chuk-artifacts: Blob storage with VFS abstraction
chuk-sessions: Session state management
chuk-ai-session-manager: Conversation tracking, infinite context

FilesExpand file tree

ROADMAP-VIRTUAL-MEMORY.md

Latest commit

History

ROADMAP-VIRTUAL-MEMORY.md

File metadata and controls

AI Virtual Memory - Roadmap

Why This Isn't RAG

Current Status (v0.11)

What's Shipped

What's Next

Vision

Storage Hierarchy Detail

L1/L2 Contract (Critical)

Phase 1: Foundations (v0.8) — SHIPPED

1.1 Page Type Taxonomy

1.2 MemoryPage Model

Claim Page Example

Procedure Page Example

1.3 PageTableEntry (with dirty tracking)

1.4 PageTable

1.5 TLB (Translation Lookaside Buffer)

1.6 Compression Levels

1.7 Context Pack Cache

1.8 Deliverables

Phase 2: Working Set Management (v0.9) — SHIPPED

2.1 WorkingSetManager

2.2 Token Budget

2.3 Pinned Pages

2.4 Anti-Thrash Protection

2.5 Access Pattern Tracking

2.6 Simple Prefetch Heuristics (v0.9)

2.7 Lite Mutation Log (v0.9)

2.8 Deliverables

Phase 3: Page Faults & Loading (v0.10) — SHIPPED (except StreamingFaultHandler)

3.1 PageFaultHandler

3.2 Streaming Fault Handling

3.3 Fault Policies

Fault Request with Intent

3.4 Lazy Loading Strategies

3.5 ArtifactsBridge

3.6 Deliverables

Phase 4: Eviction & Compression (v0.10) — SHIPPED

4.1 EvictionPolicy

4.2 Modality Compressors

4.3 Compression Triggers

4.4 Deliverables

Phase 5: Multi-Modal Integration (v0.11) — PARTIAL

5.1 Unified Content Addressing

5.2 Context Window Packing

5.3 Modality Handlers

5.4 Memory ABI (Application Binary Interface)

Tool Schema as Pinned Pages

5.5 Deliverables

Phase 6: Prefetch & Prediction (v0.11) — PARTIAL

6.1 PrefetchManager

6.2 Prediction Strategies

6.3 Background Loading

6.4 Deliverables

Phase 7: Memory Pressure & Health (v0.11)

7.1 MemoryPressureMonitor

7.2 Adaptive Thresholds

7.3 Health Checks

7.4 Deliverables

Phase 8: Coherency & Versioning (v0.12)

8.1 Page Versioning

8.2 Coherency Guarantees

8.3 Conflict Resolution

8.4 Full Event-Sourced Mutations

8.5 Deliverables

Phase 9: Tools as Memory Writers (v0.12)

9.1 Tool Memory Interface

9.2 Tool-Created Pages

9.3 Tool Learning from Memory

9.4 Deliverables

Phase 10: Advanced Features (v1.0+)

10.1 Shared Memory

10.2 Memory-Mapped Files

10.3 Copy-on-Write

10.4 Garbage Collection