disttae: add workspace safety valve to prevent OOM during HNSW index creation#24148
disttae: add workspace safety valve to prevent OOM during HNSW index creation#24148aptend wants to merge 3 commits intomatrixorigin:mainfrom
Conversation
f840cea to
79317b0
Compare
6bff5d6 to
7a98f68
Compare
When the safety valve triggers from the write path, use the current statement's start offset (offsets[statementID-1]) instead of 0. This avoids compacting prior statements' entries, which could break the offsets[] array used by RollbackLastStatement. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7a98f68 to
43eec07
Compare
🔍 Code Review Report — 七人陪审团📊 概览
📝 总结本 PR 在 🧨 破坏性测试审判
🔴 Must Fix1. 安全阀 fall-through 到 quota 检查,可能跳过 dump —
|
fixed |
Add forceDump flag so that when the safety valve triggers, the quota check is bypassed entirely. This prevents a theoretical scenario where quota acquisition succeeds and returns nil without dumping, defeating the safety valve's guarantee. Also add Test_WorkspaceForceDumpNoIncrStatement to cover the real HNSW scenario where IncrStatementID is never called (statementID==0). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
What type of PR is this?
Which issue(s) this PR fixes:
issue #24041
What this PR does / why we need it:
When
IncrStatementIDis disabled (e.g., HNSW index creation viaRunSqlwithWithDisableIncrStatement), the primary workspace dump entry point (IncrStatementID → dumpBatchLocked(ctx, 0)) is skipped. The write-path dump (dumpBatchLocked(ctx, snapshotWriteOffset)) becomes the only check, but it only scans the current statement's writes fromsnapshotWriteOffset. Since each INSERT (~131MB) is below the quota-raisedwriteWorkspaceThreshold(136MB), dump never triggers. Memory accumulates unbounded until OOM.Root cause: The workspace dump mechanism has three entry points designed for different roles:
IncrStatementID(offset=0): Primary — full workspace scan at statement boundariesWhen the primary entry point is disabled, the safety net's windowed view cannot detect global accumulation. Combined with quota "free-riding" (only the first INSERT acquires quota, subsequent INSERTs bypass quota because each individual write < raised threshold), the workspace grows without bound.
Fix: Add a global safety valve (
extraWorkspaceThreshold, default 500MB). When the write-path scan finds current writes below threshold butapproximateInMemInsertSize >= extraWorkspaceThreshold:This ensures workspace memory stays bounded even when
IncrStatementIDis disabled.Test: Added
Test_WorkspaceForceDumpOnGlobalAccumulationthat simulates the HNSW scenario (noIncrStatementID, repeated writes with advancingsnapshotWriteOffset) and verifies: