You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
End-users currently receive error messages with minimal context about how errors occurred, making it difficult to diagnose and fix issues. When exceptions propagate through multiple layers of the system (parsing → semantic analysis → execution → REST response), contextual information is lost, leaving users with generic error messages that don't provide actionable guidance.
The OpenSearch SQL/PPL plugin uses a simple exception hierarchy where exceptions carry only:
Exception type (e.g., SemanticCheckException)
A single error message string
Optional cause chain (for wrapped exceptions)
Architecture limitations:
No context accumulation: As errors bubble up through layers (Parser → Analyzer → Executor → REST), intermediate layers cannot add context without modifying the original exception message
Single-point error creation: All context must be known at the point where the exception is first thrown
Loss of structured information: Important details like query text, table names, field positions, datasource IDs, and execution context are unavailable to error handlers
Generic error responses: The ErrorMessage class can only extract type, reason, and details from the exception's localized message
Example error flow:
[Semantic Analyzer] throw new SemanticCheckException("Field 'x' not found")
↓
[Query Service] catch + rethrow (no context added)
↓
[REST Handler] ErrorMessage.toString() → {"type": "SemanticCheckException", "details": "Field 'x' not found"}
↓
[User] Receives minimal context, must guess which table, datasource, or query caused the issue
Long-Term Goals
Rich error context chains: Users should see the full story of how an error occurred, with context added at each layer:
Error: Field 'timestamp' not found
-> in table 'logs' (index pattern: 'logs-*')
-> while analyzing query: "source=logs-* | fields timestamp, message | where status > 500"
-> at position: line 1, column 25
-> datasource: 'default-opensearch'
Actionable error messages: Each error should guide users toward resolution with specific details about what went wrong and how to fix it
Structured error information: Enable programmatic error handling with machine-readable error codes and structured context fields
Consistent error experience: Unified error handling across SQL, PPL, Calcite, and legacy engines
How confident are we that this solution solves the problem?
High confidence. Similar mechanisms (Rust's anyhow/eyre, Java's Spring NestedExceptionUtils, Python's exception chaining) have proven effective in providing rich error context in existing projects. The OpenSearch SQL plugin's multi-layered architecture works well for context accumulation at each layer.
Proposal
Introduce a Report-building error mechanism inspired by Rust's anyhow/eyre libraries that:
Wraps exceptions with contextual information as they propagate through system layers
Accumulates context without modifying original exception messages
Formats rich error responses with context chains for end-users
Maintains backwards compatibility with existing exception types and error handlers
Core components:
// Report wrapper that accumulates contextpublicclassErrorReport {
privatefinalThrowablerootCause;
privatefinalList<ErrorContext> contextChain;
// Idempotent wrapper: determines whether `e` contains a Report and either returns that report or starts a new one.publicstaticErrorReportwrap(Throwablee);
publicErrorReportcontext(Stringkey, Objectvalue);
publicErrorReportcontext(ErrorContextctx);
publicStringtoDetailedMessage();
publicJSONObjecttoJSON();
}
// Context information at each layerpublicclassErrorContext {
privatefinalStringlayer; // "Parser", "Analyzer", "Executor". Can come from a constants file for consistency.privatefinalMap<String, Object> attributes; // query, table, position, datasource, etc.
}
// Enhanced error response
{
"status": 400,
"error": {
"type": "SemanticCheckException",
"code": "FIELD_NOT_FOUND",
"reason": "Invalid Query",
"details": "Field 'timestamp' not found in table 'logs'",
"context": [
{"layer": "semantic_analysis", "table": "logs", "index_pattern": "logs-*"},
{"layer": "query_parser", "query": "source=logs-* | fields timestamp", "position": {"line": 1, "column": 25}},
{"layer": "datasource", "datasource_name": "default-opensearch"}
],
"suggestion": "Check that field 'timestamp' exists in indices matching 'logs-*' pattern"
}
}
This structured output approach makes it straightforward for the front-end to enrich the error presentation, e.g. highlighting the position of the error.
Approach
Phase 1: Core Infrastructure (Non-breaking)
Create Report Wrapper Classes
ErrorReport: Wrapper that accumulates context around any Throwable
ErrorContext: Structured context at each layer (query, position, datasource, table, etc.)
ErrorCode: Enum of machine-readable error codes (FIELD_NOT_FOUND, SYNTAX_ERROR, INDEX_NOT_FOUND, etc.)
Enhance Exception Base Classes
Add optional ErrorReport field to QueryEngineException base class
Problem Statement
End-users currently receive error messages with minimal context about how errors occurred, making it difficult to diagnose and fix issues. When exceptions propagate through multiple layers of the system (parsing → semantic analysis → execution → REST response), contextual information is lost, leaving users with generic error messages that don't provide actionable guidance.
Examples of current pain points:
Failed to read mapping for index pattern [[Ljava.lang.String;@9acdf8]instead of explaining which indices have incompatible mappingsArrayIndexOutOfBoundsExceptionwhen querying index with disabled objects containing dot-only field names #4896: Error isArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0with no context on what type of array might have been involved. In this case, we removed the error condition entirely, but debugging took several hours from multiple engineersQueryShardException[failed to create query: field expansion for [*] matches too many fields, limit: 1024, got: 1572], but ideal error message should beQuery failed: Index has 1572 fields, exceeds limit of 1024. Use explicit field selection with '| fields field1, field2' or contact admin to increase cluster limit.Current State
The OpenSearch SQL/PPL plugin uses a simple exception hierarchy where exceptions carry only:
SemanticCheckException)Architecture limitations:
ErrorMessageclass can only extracttype,reason, anddetailsfrom the exception's localized messageExample error flow:
Long-Term Goals
Rich error context chains: Users should see the full story of how an error occurred, with context added at each layer:
Actionable error messages: Each error should guide users toward resolution with specific details about what went wrong and how to fix it
Structured error information: Enable programmatic error handling with machine-readable error codes and structured context fields
Consistent error experience: Unified error handling across SQL, PPL, Calcite, and legacy engines
How confident are we that this solution solves the problem?
High confidence. Similar mechanisms (Rust's anyhow/eyre, Java's Spring NestedExceptionUtils, Python's exception chaining) have proven effective in providing rich error context in existing projects. The OpenSearch SQL plugin's multi-layered architecture works well for context accumulation at each layer.
Proposal
Introduce a Report-building error mechanism inspired by Rust's anyhow/eyre libraries that:
Core components:
This structured output approach makes it straightforward for the front-end to enrich the error presentation, e.g. highlighting the position of the error.
Approach
Phase 1: Core Infrastructure (Non-breaking)
Create Report Wrapper Classes
ErrorReport: Wrapper that accumulates context around any ThrowableErrorContext: Structured context at each layer (query, position, datasource, table, etc.)ErrorCode: Enum of machine-readable error codes (FIELD_NOT_FOUND, SYNTAX_ERROR, INDEX_NOT_FOUND, etc.)Enhance Exception Base Classes
ErrorReportfield toQueryEngineExceptionbase class.withContext(key, value),.withQuery(query),.withPosition(line, col)Update ErrorMessage Formatting
ErrorMessage.javato extract and formatErrorReportif presentPhase 2: Incremental Context Adoption
Add Context at Key Layers (can be done incrementally per component):
PPLSyntaxParser,SQLSyntaxParser): Add query text, position infoAnalyzer): Add table names, field names, datasource contextQueryService): Add execution engine (Calcite/V2/Legacy), memory usageRestPPLQueryAction,RestSQLQueryAction): Add request ID, user contextEnhance Specific Error Types (prioritize by user impact):
SemanticCheckException: Add table, field, datasource contextMemoryUsageException: Add current/max memory, settings to adjustImplementation details
Alternatives
1. Structured Exception Messages (Simpler approach)
Instead of wrapping exceptions, enhance exception constructors to accept structured context:
Pros:
Cons:
2. ThreadLocal Context (MDC-style)
Use ThreadLocal storage to accumulate context throughout request lifecycle:
Pros:
Cons:
3. Status Quo with Better Messages (Minimal change)
Keep current architecture but improve individual error messages:
ErrorMessage.fetchDetails()Pros:
Cons: