Skip to content

[improvement](be) Decouple build index memory limit from schema change and make it dynamic#62398

Open
zhiqiang-hhhh wants to merge 4 commits intoapache:masterfrom
zhiqiang-hhhh:feature/build-index-dynamic-memory-limit
Open

[improvement](be) Decouple build index memory limit from schema change and make it dynamic#62398
zhiqiang-hhhh wants to merge 4 commits intoapache:masterfrom
zhiqiang-hhhh:feature/build-index-dynamic-memory-limit

Conversation

@zhiqiang-hhhh
Copy link
Copy Markdown
Contributor

@zhiqiang-hhhh zhiqiang-hhhh commented Apr 11, 2026

Summary

Build index tasks previously reused the schema change memory limit (schema_change_mem_limit_frac), which was statically divided by alter_tablet_worker_count. This caused two problems:

  1. Single build index task was under-provisioned — it shared the budget with ALTER TABLE and was divided by a worker count that didn't even match the actual build index worker pool (alter_index_worker_count).
  2. No real-time memory pressure awareness — under heavy system load, build index tasks still claimed the same amount of memory, increasing OOM risk during concurrent execution.

This PR introduces an independent, dynamic memory limit strategy for build index tasks, optimized for the common case (single task, give it maximum memory) while protecting against OOM during rare concurrent execution.

New Strategy

Each build index task gets up to soft_mem_limit × build_index_mem_limit_frac (default 60%), no division by task count. The actual limit is then constrained by two mechanisms:

1. Remaining Memory Cap

limit = min(total_budget, soft_mem_limit - process_memory_usage)

When the first task is already consuming memory, subsequent tasks naturally see less remaining memory. If the remaining memory is below min_limit (default 2GB), the task will likely fail due to the mem tracker — this is the intended behavior to prevent OOM.

2. Watermark-based Throttling

System Memory Usage Effect
< 75% of soft_mem_limit Full budget
≥ 75% (low watermark) Budget halved
≥ 85% (high watermark) Budget reduced to min_limit only

Configuration (all mutable at runtime)

Config Default Description
build_index_mem_limit_frac 0.6 Total budget as fraction of soft_mem_limit
build_index_min_memory_per_task_bytes 2GB Floor per task, prevents starvation
build_index_memory_high_watermark_pct 85 Aggressive throttling threshold
build_index_memory_low_watermark_pct 75 Moderate throttling threshold

Mem Tracker & Task Counting Lifecycle

The MemTrackerLimiter and task counting (notify_build_index_task_begin/end) are managed in the constructor/destructor of EngineIndexChangeTask / EngineCloudIndexChangeTask. This is necessary because alter_inverted_index_callback calls SCOPED_ATTACH_TASK(engine_task.mem_tracker()) before execute(), so _mem_tracker must be initialized by the time the task object is constructed. The task counting must also happen in the constructor (before _mem_tracker creation) so that memory_limitation_bytes_for_build_index() sees the correct concurrent task count.

EngineIndexChangeTask::EngineIndexChangeTask(StorageEngine& engine, ...) {
    _engine.notify_build_index_task_begin();
    _mem_tracker = MemTrackerLimiter::create_shared(
            MemTrackerLimiter::Type::SCHEMA_CHANGE, ...,
            _engine.memory_limitation_bytes_for_build_index());
}

EngineIndexChangeTask::~EngineIndexChangeTask() {
    _engine.notify_build_index_task_end();
}

Key Code Changes

  • config.h/cpp — 4 new mutable configs (DECLARE_m*/DEFINE_m*)
  • storage_engine.h/cppmemory_limitation_bytes_for_build_index() with remaining-memory cap + watermark throttling; notify_build_index_task_{begin,end}() for task counting; _running_build_index_tasks atomic counter
  • engine_index_change_task.cpp — Switched to new dynamic memory limit API; task counting and mem tracker in constructor/destructor
  • engine_cloud_index_change_task.cpp — Same changes for cloud path
  • build_index_memory_limit_test.cpp — 7 unit tests covering: task counting, full budget, remaining memory cap, min floor, high/low watermark throttling, config mutability

Test Plan

  • BE unit test: BuildIndexMemoryLimitTest.* (7 tests, all passed)
  • Regression test
  • Manual test with concurrent build index

…e and make it dynamic

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

Previously, build index tasks reused the schema change memory limit
(`schema_change_mem_limit_frac`), which was shared with ALTER TABLE
operations and divided statically by `alter_tablet_worker_count` (a
wrong divisor — should have been `alter_index_worker_count`). This
caused two problems:

1. Single build index tasks were unnecessarily constrained — they could
   not leverage available memory even when the system was idle.
2. There was no awareness of real-time memory pressure. Under heavy
   system load, build index tasks would still claim the same amount of
   memory, increasing OOM risk.

This PR introduces an independent, dynamic memory limit strategy for
build index tasks with the following behavior:

- Each task gets up to `soft_mem_limit * build_index_mem_limit_frac`
  (default 0.6) — no static division by worker count.
- The limit is capped by actual remaining memory
  (`soft_mem_limit - process_memory_usage`), so concurrent tasks
  naturally get less when the system is under pressure.
- Two-tier watermark throttling further reduces the budget:
  - Low watermark (75%): limit halved.
  - High watermark (85%): limit reduced to `min_limit` (effectively
    rejecting the task if it needs more).
- A floor of `build_index_min_memory_per_task_bytes` (default 2GB)
  prevents starvation.
- All four config parameters are mutable (`DEFINE_m*`), adjustable at
  runtime without BE restart.

### Release note

Added independent mutable configuration for build index memory limits:
`build_index_mem_limit_frac`, `build_index_min_memory_per_task_bytes`,
`build_index_memory_high_watermark_pct`,
`build_index_memory_low_watermark_pct`. Build index tasks now use
dynamic memory limits based on real-time system memory pressure instead
of sharing the static schema change memory limit.

### Check List (For Author)

- Test: Unit Test
- Behavior changed: Yes — build index tasks now have independent memory
  limits and are sensitive to system memory pressure. Default budget
  is `soft_mem_limit * 0.6` (previously `soft_mem_limit * 0.6 /
  alter_tablet_worker_count`), so single-task scenarios get
  significantly more memory.
- Does this need documentation: No

🐘 Generated with Crush

Assisted-by: Claude Opus 4.6 via Crush <crush@charm.land>
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 11, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Fix code formatting issues detected by clang-format 16 in storage_engine.cpp and build_index_memory_limit_test.cpp

### Release note

None

### Check List (For Author)

- Test: No need to test (formatting-only changes, no logic change)
- Behavior changed: No
- Does this need documentation: No
@zhiqiang-hhhh
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 81.82% (36/44) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.08% (20162/37981)
Line Coverage 36.62% (189583/517747)
Region Coverage 32.89% (147262/447744)
Branch Coverage 34.01% (64447/189468)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 81.82% (36/44) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.59% (27373/37195)
Line Coverage 57.25% (295488/516167)
Region Coverage 54.52% (246355/451880)
Branch Coverage 56.14% (106687/190050)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 81.82% (36/44) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.59% (27371/37195)
Line Coverage 57.24% (295443/516167)
Region Coverage 54.50% (246261/451880)
Branch Coverage 56.13% (106667/190050)

1 similar comment
@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 81.82% (36/44) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.59% (27371/37195)
Line Coverage 57.24% (295443/516167)
Region Coverage 54.50% (246261/451880)
Branch Coverage 56.13% (106667/190050)

### What problem does this PR solve?

Problem Summary:
1. SIGSEGV crash in alter_inverted_index_callback: _mem_tracker was created
   in execute() but SCOPED_ATTACH_TASK accessed it before execute() was
   called, causing null pointer dereference in memory_context.h:82.
2. When build_index_mem_limit_frac=0, the function returned 0 instead of
   min_limit, bypassing the floor guarantee.
3. running_build_index_tasks() was public but only used in tests.
4. No logging for actual memory limit assigned to build index tasks.

Fix:
- Move _mem_tracker creation and notify_build_index_task_begin to
  constructor, notify_build_index_task_end to destructor (RAII) for both
  EngineIndexChangeTask and EngineCloudIndexChangeTask.
- Add std::max(limit, min_limit) as final floor guarantee in
  memory_limitation_bytes_for_build_index().
- Move running_build_index_tasks() to private with FRIEND_TEST.
- Add INFO log on task creation with tablet_id and mem_limit.
- Add 4 boundary-value tests: zero frac, frac>1, inverted watermarks,
  zero min_bytes.

### Release note

None

### Check List (For Author)

- Test: Unit Test (BuildIndexMemoryLimitTest, 11 tests all passed)
- Behavior changed: No
- Does this need documentation: No

🐮 Generated with Crush

Assisted-by: Claude Opus 4.6 via Crush <crush@charm.land>
@zhiqiang-hhhh
Copy link
Copy Markdown
Contributor Author

run buildall

@zhiqiang-hhhh zhiqiang-hhhh marked this pull request as ready for review April 15, 2026 02:57
@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 82.22% (37/45) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.07% (20165/37999)
Line Coverage 36.64% (189842/518111)
Region Coverage 32.92% (147527/448128)
Branch Coverage 34.03% (64537/189620)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 82.22% (37/45) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.47% (27322/37187)
Line Coverage 57.16% (295004/516134)
Region Coverage 54.42% (245929/451901)
Branch Coverage 56.02% (106452/190011)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants