Skip to content

[Improvement](unlock)optimize the place of unlock which deadlock occurs when database tables are frequently created or deleted#62391

Open
LuGuangming wants to merge 1 commit intoapache:masterfrom
LuGuangming:master_unlock
Open

[Improvement](unlock)optimize the place of unlock which deadlock occurs when database tables are frequently created or deleted#62391
LuGuangming wants to merge 1 commit intoapache:masterfrom
LuGuangming:master_unlock

Conversation

@LuGuangming
Copy link
Copy Markdown
Contributor

…rs when database tables are frequently created or deleted

What problem does this PR solve?

Issue Number: close #62390

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…rs when database tables are frequently created or deleted
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 11, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morrySnow
Copy link
Copy Markdown
Contributor

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@yiguolei
Copy link
Copy Markdown
Contributor

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 1 blocking issue.

  1. InsertIntoTableCommand.initPlan() now calls insertExecutor.onFail(e) before releasing newestTargetTableIf.readLock(). For the OLAP insert path, OlapInsertExecutor.onFail() aborts the transaction through GlobalTransactionMgr.abortTransaction(), which acquires the target table write lock via MetaLockUtils.tryWriteLockTablesOrMetaException(...). Because the current thread still holds the read lock, the abort path can only time out after 5s instead of cleaning the transaction up immediately. This regresses the failure-path locking behavior that the existing comment is explicitly protecting.

Critical checkpoints

  • Goal of the task: simplify unlock handling around initPlan(). The current code does not preserve the failure-path behavior, and there is no test proving the intended deadlock fix.
  • Scope/minimality: the diff is small and focused, but the new finally changes lock ordering on the error path.
  • Concurrency: not safe. The insert thread now holds a table read lock while invoking a failure handler that upgrades to table write locking.
  • Lifecycle: no additional lifecycle issues beyond the lock-ownership regression above.
  • Configuration: not applicable.
  • Compatibility: not applicable.
  • Parallel code paths: the generic initPlan() path serves all insert executors, but the regression is observable on OLAP insert executors whose onFail() aborts a transaction.
  • Conditional checks: the schema/id recheck logic is still fine.
  • Test coverage: no regression or unit test was added for beginTransaction()/finalizeSink() failures under concurrent metadata changes.
  • Observability: existing WARN logs are enough to diagnose once reproduced.
  • Transaction/persistence: not safe. The abort path may fail to take the required table write lock, delaying cleanup of a PREPARE transaction.
  • Data writes/modifications: the failure path no longer guarantees prompt transactional cleanup.
  • FE/BE variable passing: not applicable.
  • Performance: the regression adds a 5s write-lock timeout on the failure path.
  • Other issues: none beyond the blocking lock-order regression above.

}
Throwables.throwIfInstanceOf(e, RuntimeException.class);
throw new IllegalStateException(e.getMessage(), e);
} finally {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling insertExecutor.onFail(e) before this readUnlock() reintroduces the lock-order problem the old code was explicitly avoiding. For the normal OLAP path, onFail() goes through OlapInsertExecutor.abortTransactionOnFail() -> GlobalTransactionMgr.abortTransaction() -> MetaLockUtils.tryWriteLockTablesOrMetaException(...), which needs the same table's write lock. Because the current thread is still holding newestTargetTableIf.readLock(), that write-lock acquisition times out instead of aborting the txn immediately.

Please keep the unlock ahead of onFail() on the exception path, and use finally only for the success / retry branches if you still want to deduplicate the non-error unlocks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] optimize the place of unlock which deadlock occurs when database tables are frequently created or deleted

4 participants