Before Creating the Enhancement Request
Summary
Add defensive checks in NettyRemotingServer#registerProcessor to detect and prohibit/warn against the usage of CallerRunsPolicy in custom user executors. Using this policy triggers blocking operations on Netty EventLoop threads when the thread pool is exhausted, leading to severe performance degradation.
Motivation
In NettyRemotingServer.java, the registerProcessor method allows users (e.g., Broker developers) to register a custom ExecutorService for specific request codes.
Currently, there is no validation on the RejectedExecutionHandler of the provided executor.
In high-concurrency scenarios (common in Cloud/K8s environments), if a user mistakenly configures a ThreadPoolExecutor with CallerRunsPolicy and the pool becomes full:
- The task execution is rejected.
CallerRunsPolicy forces the task to run in the caller's thread.
- In this context, the caller thread is the Netty IO Thread (EventLoop) (invoked from
NettyServerHandler.channelRead0).
Consequence:
The Netty IO thread gets blocked processing business logic. This stops the server from reading/writing other requests or handling heartbeats on that channel, potentially causing the node to be marked as "down" by the cluster or causing a "Stop-the-World" like pause in network throughput (DPSE - Death by Poorly Scheduled Executor).
Adding a check will prevent this silent misconfiguration.
Describe the Solution You'd Like
Modify org.apache.rocketmq.remoting.netty.NettyRemotingServer#registerProcessor to include a validation step.
Logic:
- Check if the passed
executor is an instance of ThreadPoolExecutor.
- If yes, check if
((ThreadPoolExecutor) executor).getRejectedExecutionHandler() is an instance of CallerRunsPolicy.
- If detected, throw an
IllegalArgumentException (fast fail) or at least log a generic ERROR/WARN to alert the user that this configuration is dangerous.
Sample Code Logic:
if (executor instanceof ThreadPoolExecutor) {
if (((ThreadPoolExecutor) executor).getRejectedExecutionHandler() instanceof ThreadPoolExecutor.CallerRunsPolicy) {
throw new IllegalArgumentException("CallerRunsPolicy is strictly forbidden in RocketMQ Remoting as it blocks Netty IO threads.");
}
}
### Describe Alternatives You've Considered
### 6. Describe Alternatives You've Considered
```text
1. **Documentation warning only:** We could just add a warning in Javadoc, but users often miss documentation details when configuring custom thread pools.
2. **Wrapper Executor:** Wrapping the user executor to intercept the rejection. This is overly complex and might introduce overhead.
3. **Log only:** Just logging a WARN message. This is a softer approach but might be ignored during startup, leading to runtime failures later. Fast-fail is preferred for such critical configuration errors.
### Additional Context
Relevant Code Location:
- Registration: `org.apache.rocketmq.remoting.netty.NettyRemotingServer#registerProcessor`
- Execution path: `org.apache.rocketmq.remoting.netty.NettyRemotingServer.NettyServerHandler#channelRead0` calling `remotingAbstract.processMessageReceived`
Before Creating the Enhancement Request
Summary
Add defensive checks in
NettyRemotingServer#registerProcessorto detect and prohibit/warn against the usage ofCallerRunsPolicyin custom user executors. Using this policy triggers blocking operations on Netty EventLoop threads when the thread pool is exhausted, leading to severe performance degradation.Motivation
In
NettyRemotingServer.java, theregisterProcessormethod allows users (e.g., Broker developers) to register a customExecutorServicefor specific request codes.Currently, there is no validation on the
RejectedExecutionHandlerof the provided executor.In high-concurrency scenarios (common in Cloud/K8s environments), if a user mistakenly configures a
ThreadPoolExecutorwithCallerRunsPolicyand the pool becomes full:CallerRunsPolicyforces the task to run in the caller's thread.NettyServerHandler.channelRead0).Consequence:
The Netty IO thread gets blocked processing business logic. This stops the server from reading/writing other requests or handling heartbeats on that channel, potentially causing the node to be marked as "down" by the cluster or causing a "Stop-the-World" like pause in network throughput (DPSE - Death by Poorly Scheduled Executor).
Adding a check will prevent this silent misconfiguration.
Describe the Solution You'd Like
Modify
org.apache.rocketmq.remoting.netty.NettyRemotingServer#registerProcessorto include a validation step.Logic:
executoris an instance ofThreadPoolExecutor.((ThreadPoolExecutor) executor).getRejectedExecutionHandler()is an instance ofCallerRunsPolicy.IllegalArgumentException(fast fail) or at least log a genericERROR/WARNto alert the user that this configuration is dangerous.Sample Code Logic: