Skip to content

Xyseries command implementation#5343

Open
asifabashar wants to merge 11 commits intoopensearch-project:mainfrom
asifabashar:xyseries2
Open

Xyseries command implementation#5343
asifabashar wants to merge 11 commits intoopensearch-project:mainfrom
asifabashar:xyseries2

Conversation

@asifabashar
Copy link
Copy Markdown
Contributor

@asifabashar asifabashar commented Apr 14, 2026

Description

Problem Statement
xyseries command is missing in PPL which is part of todo in roadmap

This RFC proposes adding a new PPL transforming command, xyseries, with syntax and behavior aligned with SPL .

xyseries converts row-oriented grouped results into a wide table where:

  • One field is the X axis (x-field) and stays as a row key.
  • One field (y-name-field) provides a part of column name from proivded paramter values to choose from and used as pivot values in conjuction with (y-data-field) field name , where data for this column are pivoted cells that are agrregated value fields.

OpenSearch PPL already supports stats, chart, and timechart, but there is no direct equivalent for SPL xyseries.

Adding xyseries improves:

  • Query readability for pivot-style result shaping.
  • Dashboard interoperability for charts requiring wide-table series layout.

Current State
There is no xyseries compatiable command.

Long-Term Goals
Provide xyseries functionality.

Proposal
Summarizes the suggested solution or improvement.

Approach

User Syntax

xyseries  [sep=<string>] [format=<string>] <x-field> <y-name-field> in (<value1>, <value2>, ...) <y-data-field>[,<y-data-field2>

Arguments

  • x-field (required): row key in output.
  • y-name-field (required): Number of values used to generate output series column names based on provided in paramter values such as <value1>, <value2> etc
  • y-data-field... (required, at least one): value field(s) used to fill cells.

Options

Option Default Behavior
sep ":" Separator between y-data-field name and 'in' parameter values , etc.
format None Naming template for output series columns represented as $AGG$<sep>$VAL$ . For each data row pivot value for y-name-field and $AGG$ (y-data-field name). The format parameter inserts the (y-data-field name) uses “:” as the built-in separator and then pivot value. $VAL$

If format is omitted:

  • Single y-data-field: output column is based row number .
  • Multiple y-data-field: output column is $AGG$<sep>$VAL$ where $VAL$ represent provided pivot values in 'in' operator.

Examples

Actual data:
ppl
 source=weblogs| stats count(host) as host_cnt, count(method) as method_cnt by url, response
host_cnt method_cnt response url
3 3 200 /page1
1 1 404 /page1
5 5 200 /page2
2 2 500 /page2

After xyseries transformation:
ppl

source=weblogs
| stats count(host) as host_cnt, count(method) as method_cnt by url, response
| xyseries url response in ("200","404","500") host_cnt, method_cnt

url host_cnt: 200 host_cnt: 404 host_cnt: 500 method_cnt: 200 method_cnt: 404 method_cnt: 500
/page1 3 1 null 3 1 null
/page2 5 null 2 5 null 2

If format is not specified, by default, you’d see something like:
url:count(host) and url:count(method) as column names.

If you provide your own separator through the format option, it overrides anything defined with sep. If for example, sep is set to “-”, but format specifies “+”, and because format has higher priority, the “+” is used.
Here, $VAL$ and $AGG$ are placeholders represent and the respectively. In the output, you can see that the name field (url) and the data field count(host) may appear in the position of $VAL$ and $AGG$, depending on how the format string arranges them.

Semantics

Input shape and type rules

Support xyseries only when pivot values are explicitly provided.

  • x-field and y-name-field must exist in input schema..
  • y-name-field values are converted to string for output column naming.
  • y-data-field One or more fields that contain the data to chart. If there are multiple fields specified, separate the field names with commas.

Null handling

  • Rows with null in a given y-data-field do not contribute a value for that generated series column.

Implementation Plan

Validate required fields/options and defaults.
Project to x, y_name selected from provided pivot values in 'in' operator, and selected y_data fields.
Build generated series column name using format or default naming with sep .
Pivot to wide schema based on provided pivot values passed as parameter.

Today our direction is to keep PPL commands translatable to SQL/Calcite plans as much as possible. We also plan to run PPL on Spark; SQL-native xyseries (with explicit in (...) values) is portable to Spark SQL, while post-processing would require separate Spark-side custom implementation.
Composability: if xyseries is implemented after query execution, it effectively must be terminal and cannot be reliably chained with downstream commands
. (Reference: Comments from penghuo )

out of scope

grouping option which does not apply for multifile input in OpenSearch.
Alternative
chart , timeseries can be used for similar use cases but not exactly same.

Limitations

As the rows will be transposed to columns, a limit is required field as calcite planning phase number of rows are not known.

Related Issues

Resolves [ #5142 ]

Check List

  • [ x] New functionality includes testing.
  • [x ] New functionality has been documented.
  • [x ] New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • [x ] New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • [ x] Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Asif Bashar <asif.bashar@gmail.com>
Signed-off-by: Asif Bashar <asif.bashar@gmail.com>
Signed-off-by: Asif Bashar <asif.bashar@gmail.com>
Signed-off-by: Asif Bashar <asif.bashar@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 14, 2026

PR Reviewer Guide 🔍

(Review updated until commit 1858a98)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Xyseries AST node, grammar, and parser

Relevant files:

  • core/src/main/java/org/opensearch/sql/ast/tree/Xyseries.java
  • core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/analysis/Analyzer.java
  • ppl/src/main/antlr/OpenSearchPPLLexer.g4
  • ppl/src/main/antlr/OpenSearchPPLParser.g4
  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
  • ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java

Sub-PR theme: Xyseries Calcite execution, integration tests, and documentation

Relevant files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteXySeriesCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java
  • integ-test/src/test/java/org/opensearch/sql/security/CrossClusterSearchIT.java
  • docs/user/ppl/cmd/xyseries.md

⚡ Recommended focus areas for review

Empty Pivot Values

If pivotValues is empty (no values in the IN clause), the pivot operation will produce only the x-field column with no pivoted columns. There is no validation or error thrown for an empty pivot values list, which could lead to confusing results or downstream errors.

List<String> pivotValues = node.getPivotValues() != null ? node.getPivotValues() : List.of();
String separator = node.getSeparator();
String format = node.getFormat();

// Build the pivot axis - cast to VARCHAR if needed for string comparison
RexNode yNameRef = b.field(yNameFieldName);
RexNode axis;
if (!SqlTypeUtil.isCharacter(yNameRef.getType())) {
  RelDataType varchar =
      rx.getTypeFactory()
          .createTypeWithNullability(
              rx.getTypeFactory().createSqlType(SqlTypeName.VARCHAR), true);
  axis = rx.makeCast(varchar, yNameRef, true);
} else {
  axis = yNameRef;
}

// Build aggregate calls - MAX for each y-data field
List<AggCall> aggCalls =
    yDataFieldNames.stream()
        .map(name -> b.max(b.field(name)).as(name))
        .collect(Collectors.toList());

// Build pivot value entries: alias -> [literal(value)]
// LinkedHashMap preserves insertion order for deterministic column ordering
LinkedHashMap<String, List<RexNode>> pivotValueMap = new LinkedHashMap<>();
for (String val : pivotValues) {
  pivotValueMap.put(val, ImmutableList.of(b.literal(val)));
}

// Execute pivot: decomposes into GROUP BY x-field with FILTER-based aggregation
// Produces columns: x-field, {val1}_{agg1}, {val1}_{agg2}, {val2}_{agg1}, ...
b.pivot(
    b.groupKey(b.field(xFieldName)),
    aggCalls,
    ImmutableList.of(axis),
    pivotValueMap.entrySet());
Pivot Column Name Assumption

The reorder projection step assumes pivot output columns are named {value}_{agg} (line 3458). This relies on Calcite's internal pivot column naming convention, which may not be stable across versions or may differ for special characters in values/field names. If the naming convention changes or values contain underscores, column lookup by b.field(pivotColName) will fail at runtime.

for (String aggName : yDataFieldNames) {
  for (String pivotVal : pivotValues) {
    // Reference pivot output column by its generated name: {value}_{agg}
    String pivotColName = pivotVal + "_" + aggName;
    reorderProjections.add(b.field(pivotColName));
    reorderNames.add(generateColumnName(aggName, pivotVal, separator, format));
  }
}
Weak Test Assertions

The testXyseriesCommand* tests in NewAddedCommandsIT catch ResponseException and call verifyQuery(result) on the error response, meaning the test passes even if the command fails with an error. The tests do not assert on actual output schema or data values, making them ineffective for validating correctness.

public void testXyseriesCommand() throws IOException {

  JSONObject result;
  try {
    result =
        executeQuery(
            StringEscapeUtils.escapeJson(
                String.format(
                    "search source=%s | stats avg(balance) as avg_balance by gender, state"
                        + " | xyseries state gender in (\"F\", \"M\") avg_balance",
                    TEST_INDEX_BANK)));

  } catch (ResponseException e) {
    result = new JSONObject(TestUtils.getResponseBody(e.getResponse()));
  }
  verifyQuery(result);
}

@Test
public void testXyseriesCommandMultipleDataFields() throws IOException {

  JSONObject result;
  try {
    result =
        executeQuery(
            StringEscapeUtils.escapeJson(
                String.format(
                    "search source=%s | stats avg(balance) as avg_balance, count() as cnt by"
                        + " gender, state | xyseries state gender in (\"F\", \"M\") avg_balance,"
                        + " cnt",
                    TEST_INDEX_BANK)));
  } catch (ResponseException e) {
    result = new JSONObject(TestUtils.getResponseBody(e.getResponse()));
  }
  verifyQuery(result);
}

@Test
public void testXyseriesCommandWithSep() throws IOException {
  JSONObject result;
  try {
    result =
        executeQuery(
            StringEscapeUtils.escapeJson(
                String.format(
                    "search source=%s | stats avg(balance) as avg_balance by gender, state"
                        + " | xyseries sep=\"-\" state gender in (\"F\", \"M\") avg_balance",
                    TEST_INDEX_BANK)));
  } catch (ResponseException e) {
    result = new JSONObject(TestUtils.getResponseBody(e.getResponse()));
  }
  verifyQuery(result);
}

@Test
public void testXyseriesCommandWithFormat() throws IOException {
  JSONObject result;
  try {
    result =
        executeQuery(
            StringEscapeUtils.escapeJson(
                String.format(
                    "search source=%s | stats avg(balance) as avg_balance by gender, state"
                        + " | xyseries format=\"$VAL$_$AGG$\" state gender in (\"F\", \"M\")"
                        + " avg_balance",
                    TEST_INDEX_BANK)));

  } catch (ResponseException e) {
    result = new JSONObject(TestUtils.getResponseBody(e.getResponse()));
  }
  verifyQuery(result);
}
Null Separator Handling

In generateColumnName, if format is null the method uses separator directly. However, if separator is also null (e.g., due to unexpected AST construction), a NullPointerException will occur. The Xyseries AST node does not enforce non-null separator at construction time.

private String generateColumnName(
    String yDataFieldName, String pivotValue, String separator, String format) {
  if (format != null) {
    return format.replace("$AGG$", yDataFieldName).replace("$VAL$", pivotValue);
  }
  return yDataFieldName + separator + pivotValue;
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 14, 2026

PR Code Suggestions ✨

Latest suggestions up to 1858a98

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Use actual pivot output column names instead of assumed names

The pivot column name is constructed as pivotVal + "_" + aggName, but Calcite's
pivot output column naming convention may differ (e.g., it may use a different
delimiter or ordering). If the generated pivot column name doesn't match what
Calcite actually produces, b.field(pivotColName) will throw a runtime exception. You
should verify the exact column names produced by Calcite's pivot operation and use
those names, or use positional references instead of name-based lookups.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3455-3462]

+RelNode pivotRel = b.peek();
+List<String> pivotOutputNames = pivotRel.getRowType().getFieldNames();
+// Skip the x-field (first column)
+int colIdx = 1;
 for (String aggName : yDataFieldNames) {
   for (String pivotVal : pivotValues) {
-    // Reference pivot output column by its generated name: {value}_{agg}
-    String pivotColName = pivotVal + "_" + aggName;
+    String pivotColName = pivotOutputNames.get(colIdx++);
     reorderProjections.add(b.field(pivotColName));
     reorderNames.add(generateColumnName(aggName, pivotVal, separator, format));
   }
 }
Suggestion importance[1-10]: 7

__

Why: The assumption that Calcite's pivot produces columns named {value}_{agg} may not hold in all versions or configurations. If the naming convention differs, b.field(pivotColName) will throw a runtime exception. Using actual output column names from pivotRel.getRowType().getFieldNames() is more robust.

Medium
General
Don't silently swallow query execution failures in tests

Catching ResponseException and continuing to call verifyQuery(result) on an error
response means the test will pass even when the query fails with an error. The test
should either not catch the exception (letting it fail the test) or explicitly
assert that the result is not an error response before verifying it.

integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java [538-552]

-JSONObject result;
-try {
-  result =
-      executeQuery(
-          StringEscapeUtils.escapeJson(...));
-} catch (ResponseException e) {
-  result = new JSONObject(TestUtils.getResponseBody(e.getResponse()));
-}
+JSONObject result =
+    executeQuery(
+        StringEscapeUtils.escapeJson(
+            String.format(
+                "search source=%s | stats avg(balance) as avg_balance by gender, state"
+                    + " | xyseries state gender in (\"F\", \"M\") avg_balance",
+                TEST_INDEX_BANK)));
 verifyQuery(result);
Suggestion importance[1-10]: 6

__

Why: Catching ResponseException and calling verifyQuery(result) on an error response means the test can pass even when the query fails. Removing the try-catch would make the test correctly fail on query errors, improving test reliability.

Low
Validate that pivot values list is not empty

If pivotValues is empty, the inner loop body never executes, but
b.project(reorderProjections, reorderNames, true) will still be called with only the
x-field projection. This results in a plan that silently drops all data columns
rather than failing with a meaningful error. An empty pivot values list should be
validated and rejected early with a clear error message.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3409]

 List<String> pivotValues = node.getPivotValues() != null ? node.getPivotValues() : List.of();
-...
-for (String aggName : yDataFieldNames) {
-  for (String pivotVal : pivotValues) {
-    String pivotColName = pivotVal + "_" + aggName;
-    reorderProjections.add(b.field(pivotColName));
+if (pivotValues.isEmpty()) {
+  throw new SemanticCheckException("xyseries requires at least one pivot value in the IN clause");
+}
Suggestion importance[1-10]: 5

__

Why: An empty pivotValues list would silently produce a plan with only the x-field column, which is misleading. Adding an early validation with a clear error message improves user experience, though the grammar already requires at least one value in the IN clause.

Low
Guard against null pivot values context

If ctx.xyseriesPivotValues() returns null (e.g., due to a parse error or optional
grammar rule), calling .stringLiteral() on it will throw a NullPointerException. A
null check or grammar-level enforcement should be added to handle this case
gracefully.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java [1674-1677]

 List<String> pivotValues =
-    ctx.xyseriesPivotValues().stringLiteral().stream()
-        .map(s -> StringUtils.unquoteText(s.getText()))
-        .collect(Collectors.toList());
+    ctx.xyseriesPivotValues() == null
+        ? List.of()
+        : ctx.xyseriesPivotValues().stringLiteral().stream()
+            .map(s -> StringUtils.unquoteText(s.getText()))
+            .collect(Collectors.toList());
Suggestion importance[1-10]: 3

__

Why: The grammar rule xyseriesPivotValues is required (not optional) in the xyseriesCommand rule, so ctx.xyseriesPivotValues() should never be null in practice. This is a low-impact defensive check.

Low

Previous suggestions

Suggestions up to commit 13ddcb5
CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid fragile pivot column name assumptions

The pivot output column name is hardcoded as pivotVal + "_" + aggName, but Calcite's
RelBuilder.pivot() may generate column names in a different format (e.g.,
aggName_pivotVal or with different delimiters depending on the Calcite version). If
the generated name doesn't match, b.field(pivotColName) will throw a runtime
exception. The actual generated column name should be verified against Calcite's
pivot output naming convention, or the columns should be referenced by index rather
than by name.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3457-3459]

-// Reference pivot output column by its generated name: {value}_{agg}
-String pivotColName = pivotVal + "_" + aggName;
-reorderProjections.add(b.field(pivotColName));
+// Reference pivot output column by index to avoid dependency on Calcite's internal naming
+// Pivot produces: xField, then for each pivotVal: agg1, agg2, ...
+// So column index = 1 + (pivotValIndex * yDataFieldNames.size()) + aggIndex
+int pivotValIndex = pivotValues.indexOf(pivotVal);
+int aggIndex = yDataFieldNames.indexOf(aggName);
+int colIndex = 1 + pivotValIndex * yDataFieldNames.size() + aggIndex;
+reorderProjections.add(b.field(colIndex));
Suggestion importance[1-10]: 6

__

Why: The hardcoded pivotVal + "_" + aggName naming convention for pivot output columns may not match Calcite's actual generated column names, which could cause runtime exceptions. However, the integration tests appear to pass with this naming, suggesting it may work with the current Calcite version, making this a moderate concern.

Low
Test silently swallows query execution failures

The test silently catches ResponseException and continues to call
verifyQuery(result) on the error response body. This means if the query fails (e.g.,
because Calcite is not enabled), the test will pass as long as verifyQuery doesn't
fail on the error JSON, masking real failures. The exception should either be
rethrown or the test should assert that no error occurred.

integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java [538-552]

-JSONObject result;
-try {
-  result =
-      executeQuery(
-          StringEscapeUtils.escapeJson(
-              String.format(
-                  "search source=%s | stats avg(balance) as avg_balance by gender, state"
-                      + " | xyseries state gender in (\"F\", \"M\") avg_balance",
-                  TEST_INDEX_BANK)));
-
-} catch (ResponseException e) {
-  result = new JSONObject(TestUtils.getResponseBody(e.getResponse()));
-}
+JSONObject result =
+    executeQuery(
+        StringEscapeUtils.escapeJson(
+            String.format(
+                "search source=%s | stats avg(balance) as avg_balance by gender, state"
+                    + " | xyseries state gender in (\"F\", \"M\") avg_balance",
+                TEST_INDEX_BANK)));
 verifyQuery(result);
Suggestion importance[1-10]: 5

__

Why: The catch block silently swallows ResponseException and passes the error JSON to verifyQuery, which could mask real test failures. However, looking at the context, this pattern appears intentional to handle both Calcite-enabled and non-Calcite environments, so removing it entirely may break the test's design intent.

Low
Guard against null separator causing NullPointerException

The separator field is declared as final and set via the constructor, but the
default value ": " is only enforced in the AstBuilder parser. If Xyseries is
constructed directly (e.g., in tests or other code paths) with a null separator,
generateColumnName in CalciteRelNodeVisitor will produce a NullPointerException when
concatenating yDataFieldName + separator + pivotValue. A null-check or default
fallback should be added in generateColumnName.

core/src/main/java/org/opensearch/sql/ast/tree/Xyseries.java [3482-3488]

-/** Separator between y-data-field name and pivot value in column names. Default ":". */
-private final String separator;
+private String generateColumnName(
+    String yDataFieldName, String pivotValue, String separator, String format) {
+  if (format != null) {
+    return format.replace("$AGG$", yDataFieldName).replace("$VAL$", pivotValue);
+  }
+  String sep = separator != null ? separator : ": ";
+  return yDataFieldName + sep + pivotValue;
+}
Suggestion importance[1-10]: 5

__

Why: The separator field could be null if Xyseries is constructed directly without going through AstBuilder, which would cause a NullPointerException in generateColumnName. Adding a null-check fallback is a valid defensive measure, though the risk is limited since the parser enforces the default.

Low
General
Document aggregation assumption to prevent silent data errors

Using MAX hardcoded for all y-data fields may produce incorrect results when the
upstream data has multiple rows per group (e.g., for count fields, MAX instead of
SUM would give wrong answers). The aggregation function used in the pivot should
match the semantic intent, or the command should document that it only works
correctly when the input already has at most one row per (x-field, y-name-field)
combination.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3427-3430]

+// Note: xyseries assumes input is already aggregated to at most one row per
+// (xField, yNameField) group (e.g., output of a preceding stats command).
+// MAX is used as a passthrough aggregation; if multiple rows exist per group,
+// results may be incorrect.
 List<AggCall> aggCalls =
     yDataFieldNames.stream()
         .map(name -> b.max(b.field(name)).as(name))
         .collect(Collectors.toList());
Suggestion importance[1-10]: 2

__

Why: This suggestion only adds a comment to document an existing assumption without changing any code logic. The improved_code is functionally identical to the existing_code, making it a documentation-only change with minimal impact.

Low
Suggestions up to commit a72ffe1
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix tests silently swallowing query failures

Silently catching ResponseException and continuing to call verifyQuery on the error
response means the test will pass even when the query fails with an error. This
pattern masks failures — if the non-Calcite engine throws
getOnlyForCalciteException, the test should either assert the error or skip, not
silently accept it as a valid result.

integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java [538-552]

 JSONObject result;
 try {
   result =
       executeQuery(
           StringEscapeUtils.escapeJson(...));
+  verifyQuery(result);
 } catch (ResponseException e) {
   result = new JSONObject(TestUtils.getResponseBody(e.getResponse()));
+  // xyseries is only supported with Calcite; verify the error is the expected one
+  Assert.assertTrue(
+      result.optString("details", "").contains("xyseries")
+          || result.optString("type", "").contains("SemanticCheckException"));
 }
-verifyQuery(result);
Suggestion importance[1-10]: 5

__

Why: The concern is valid — catching ResponseException and calling verifyQuery on an error response could mask failures. However, the improved_code uses ... placeholders and doesn't accurately reflect the actual code structure, making it hard to evaluate precisely. The pattern of accepting both success and expected error responses is intentional for non-Calcite engine compatibility tests.

Low
Guard against missing pivot column references

The pivot output column name is hardcoded as {value}_{agg} with an underscore
separator, but this naming convention depends on Calcite's internal pivot
implementation and may not always match. If Calcite uses a different delimiter or
ordering, b.field(pivotColName) will throw a runtime exception. The actual generated
column name should be verified against Calcite's pivot output, or the column should
be referenced by position rather than by name.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3457-3460]

 // Reference pivot output column by its generated name: {value}_{agg}
+// Calcite pivot generates columns as: {value}_{agg}
 String pivotColName = pivotVal + "_" + aggName;
-reorderProjections.add(b.field(pivotColName));
+try {
+  reorderProjections.add(b.field(pivotColName));
+} catch (IllegalArgumentException e) {
+  throw new IllegalStateException(
+      "Pivot column '" + pivotColName + "' not found in pivot output. "
+      + "Available fields: " + b.peek().getRowType().getFieldNames(), e);
+}
Suggestion importance[1-10]: 4

__

Why: The concern about Calcite's internal pivot column naming is valid, but wrapping with a try-catch for IllegalArgumentException is not the cleanest approach. The improved code adds error handling that would help diagnose issues, but this is a minor defensive improvement rather than a critical bug fix.

Low
General
Throw on unsupported field expression types

The fallback expr.toString() for unrecognized expression types will silently produce
an incorrect field name (e.g., the full expression string), causing a runtime
failure when b.field(...) is called with that name. An explicit exception should be
thrown for unsupported expression types to surface the problem clearly.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3472-3480]

 private String resolveFieldName(UnresolvedExpression expr) {
   if (expr instanceof Field) {
     return ((Field) expr).getField().toString();
   }
   if (expr instanceof Alias) {
     return ((Alias) expr).getName();
   }
-  return expr.toString();
+  throw new IllegalArgumentException(
+      "Unsupported expression type for xyseries field resolution: "
+          + expr.getClass().getSimpleName() + " (" + expr + ")");
 }
Suggestion importance[1-10]: 5

__

Why: Replacing the silent expr.toString() fallback with an explicit exception is a good defensive practice that surfaces bugs early. The improved code accurately reflects the change and would make debugging easier when unsupported expression types are encountered.

Low
Validate non-empty pivot values list

If pivotValues is empty, the pivot operation will produce no pivoted columns, and
the subsequent b.field(pivotColName) calls in the reorder loop will never execute —
but the b.pivot(...) call itself may fail or produce an unexpected result. An
explicit check for an empty pivot values list should be added to provide a clear
error message.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3409-3437]

 List<String> pivotValues = node.getPivotValues();
+if (pivotValues == null || pivotValues.isEmpty()) {
+  throw new IllegalArgumentException("xyseries requires at least one pivot value in the IN clause");
+}
 ...
 LinkedHashMap<String, List<RexNode>> pivotValueMap = new LinkedHashMap<>();
 for (String val : pivotValues) {
   pivotValueMap.put(val, ImmutableList.of(b.literal(val)));
 }
Suggestion importance[1-10]: 4

__

Why: Adding a guard for empty pivotValues is a reasonable defensive check, but the grammar already requires at least one stringLiteral in xyseriesPivotValues, making this scenario unlikely in practice. The improved_code also uses ... placeholders which don't accurately reflect the actual code.

Low
Suggestions up to commit a72ffe1
CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid fragile hardcoded pivot column name assumption

The pivot output column name is hardcoded as {value}_{agg} with an underscore
separator, but this naming convention depends on Calcite's internal pivot
implementation and may not always match. If Calcite uses a different delimiter or
ordering, b.field(pivotColName) will throw a runtime exception. The actual generated
column name should be verified against Calcite's pivot output, or the column should
be referenced by position rather than by name.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3457-3459]

 // Reference pivot output column by its generated name: {value}_{agg}
-String pivotColName = pivotVal + "_" + aggName;
-reorderProjections.add(b.field(pivotColName));
+// Calcite pivot generates columns in order: for each pivot value, for each agg
+// Use positional reference to avoid dependency on Calcite's internal naming
+int pivotColIndex = 1 + yDataFieldNames.indexOf(aggName) + pivotValues.indexOf(pivotVal) * yDataFieldNames.size();
+reorderProjections.add(b.field(pivotColIndex));
Suggestion importance[1-10]: 7

__

Why: The hardcoded {value}_{agg} naming convention for pivot output columns depends on Calcite's internal implementation. If Calcite changes its naming or uses a different delimiter, b.field(pivotColName) will throw a runtime exception. However, the improved code using positional indexing also has potential issues with ordering assumptions, so the score is moderate.

Medium
Silent exception swallowing masks test failures

Catching ResponseException and continuing to call verifyQuery on the error response
silently masks test failures. If the query fails (e.g., because Calcite is not
enabled or the command is unsupported), the test will pass with an error response
body instead of failing. The exception should either be rethrown or the test should
assert that no error occurred.

integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java [538-552]

-JSONObject result;
-try {
-  result =
-      executeQuery(
-          StringEscapeUtils.escapeJson(...));
-} catch (ResponseException e) {
-  result = new JSONObject(TestUtils.getResponseBody(e.getResponse()));
-}
+JSONObject result =
+    executeQuery(
+        StringEscapeUtils.escapeJson(
+            String.format(
+                "search source=%s | stats avg(balance) as avg_balance by gender, state"
+                    + " | xyseries state gender in (\"F\", \"M\") avg_balance",
+                TEST_INDEX_BANK)));
 verifyQuery(result);
Suggestion importance[1-10]: 6

__

Why: Catching ResponseException and continuing with verifyQuery on the error response body silently masks test failures. If the query fails, the test would pass incorrectly, reducing test reliability. However, the existing_code in the suggestion uses a placeholder ... rather than the actual code, making the match imprecise.

Low
General
Throw explicit error for unsupported expression types

The fallback expr.toString() for unrecognized expression types will likely produce
an incorrect field name (e.g., the full expression string representation), causing a
runtime failure when b.field(...) is called with that name. An explicit error should
be thrown for unsupported expression types to provide a clear and actionable error
message.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3472-3480]

 private String resolveFieldName(UnresolvedExpression expr) {
   if (expr instanceof Field) {
     return ((Field) expr).getField().toString();
   }
   if (expr instanceof Alias) {
     return ((Alias) expr).getName();
   }
-  return expr.toString();
+  throw new IllegalArgumentException(
+      "xyseries: unsupported expression type for field resolution: "
+          + expr.getClass().getSimpleName() + " (" + expr + ")");
 }
Suggestion importance[1-10]: 5

__

Why: The fallback expr.toString() in resolveFieldName could produce incorrect field names for unsupported expression types, leading to confusing runtime errors. Throwing an explicit IllegalArgumentException would provide a clearer error message, improving debuggability.

Low
Document aggregation assumption for pivot correctness

Using MAX as the aggregation function for all y-data fields is incorrect in the
general case. The input to xyseries is already aggregated (typically from a
preceding stats command), so the pivot should use the actual value rather than
applying MAX. This could produce wrong results if multiple rows share the same
x-field and pivot value combination, and silently discards data. Consider using MAX
only as a fallback and documenting this assumption clearly, or using a more
appropriate aggregation like ANY_VALUE if supported.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3427-3430]

+// Use MAX as the pivot aggregation - input is expected to be pre-aggregated by stats,
+// so each (xField, yNameField) combination should have at most one row.
+// MAX is used as a standard SQL-compatible way to select the single value.
 List<AggCall> aggCalls =
     yDataFieldNames.stream()
         .map(name -> b.max(b.field(name)).as(name))
         .collect(Collectors.toList());
Suggestion importance[1-10]: 2

__

Why: The suggestion only adds comments to document the existing MAX aggregation behavior without changing the logic. This is a documentation-only improvement with minimal functional impact.

Low
Suggestions up to commit c7b7536
CategorySuggestion                                                                                                                                    Impact
Possible issue
Resolve unresolved merge conflict in code

The code contains unresolved Git merge conflict markers (<<<<<<< HEAD, =======,
>>>>>>> 59ad4dc...). This will cause a compilation failure. The conflict must be
resolved by choosing one of the two implementations (or merging them) and removing
all conflict markers before merging this PR.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3457-3473]

 for (String aggName : yDataFieldNames) {
   for (String pivotVal : pivotValues) {
-<<<<<<< HEAD
-      // CASE WHEN CAST(y_name_field AS VARCHAR) = 'pivotVal' THEN y_data_field ELSE NULL END
-      RexNode condition = b.call(SqlStdOperatorTable.EQUALS, yNameAsString, b.literal(pivotVal));
-      RexNode dataRef = b.field(yDataFieldName);
-      RexNode caseExpr = b.call(SqlStdOperatorTable.CASE, condition, dataRef, b.literal(null));
-
-      // Generate the output column name
-      String colName = generateColumnName(yDataFieldName, pivotVal, separator, format);
-      projections.add(b.alias(caseExpr, colName));
-      projectionNames.add(colName);
-=======
-      // Reference pivot output column by its generated name: {value}_{agg}
-      String pivotColName = pivotVal + "_" + aggName;
-      reorderProjections.add(b.field(pivotColName));
-      reorderNames.add(generateColumnName(aggName, pivotVal, separator, format));
->>>>>>> 59ad4dc48ce98b5121320f8a226e6148a9386e27
+    // Reference pivot output column by its generated name: {value}_{agg}
+    String pivotColName = pivotVal + "_" + aggName;
+    reorderProjections.add(b.field(pivotColName));
+    reorderNames.add(generateColumnName(aggName, pivotVal, separator, format));
   }
 }
Suggestion importance[1-10]: 10

__

Why: The code contains unresolved Git merge conflict markers (<<<<<<< HEAD, =======, >>>>>>>), which will cause a compilation failure. This is a critical issue that must be resolved before the PR can be merged.

High
Silent exception swallowing hides test failures

The tests silently catch ResponseException and continue to call verifyQuery(result)
on the error response body. This means if the xyseries command throws an error
(e.g., because Calcite is not enabled), the test will not fail but will instead
verify the error JSON, potentially masking real failures. The exception should be
re-thrown or the test should assert that no exception occurred.

integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java [538-552]

-JSONObject result;
-try {
-  result =
-      executeQuery(
-          StringEscapeUtils.escapeJson(...));
-} catch (ResponseException e) {
-  result = new JSONObject(TestUtils.getResponseBody(e.getResponse()));
-}
+JSONObject result =
+    executeQuery(
+        StringEscapeUtils.escapeJson(
+            String.format(
+                "search source=%s | stats avg(balance) as avg_balance by gender, state"
+                    + " | xyseries state gender in (\"F\", \"M\") avg_balance",
+                TEST_INDEX_BANK)));
 verifyQuery(result);
Suggestion importance[1-10]: 6

__

Why: The catch (ResponseException e) block silently swallows errors and continues to call verifyQuery(result) on the error response, which can mask real test failures. However, this pattern appears to be intentional for testing both Calcite-enabled and non-Calcite scenarios, so the impact depends on the test's intent.

Low
General
Hardcoded MAX aggregation may fail for non-numeric types

The pivot always uses MAX as the aggregation function regardless of the original
aggregation applied in the upstream stats command. Since the pivot is applied after
the stats aggregation, the y-data fields already contain aggregated values. Using
MAX is a reasonable choice here, but if the upstream aggregation produces
non-numeric types (e.g., strings), MAX may fail or produce unexpected results.
Consider using a more neutral aggregation like ANY_VALUE or FIRST_VALUE to avoid
type-related issues.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3427-3431]

-// Build aggregate calls - MAX for each y-data field
+// Build aggregate calls - use ANY_VALUE since data is already aggregated upstream
 List<AggCall> aggCalls =
     yDataFieldNames.stream()
-        .map(name -> b.max(b.field(name)).as(name))
+        .map(name -> b.aggregateCall(SqlStdOperatorTable.ANY_VALUE, b.field(name)).as(name))
         .collect(Collectors.toList());
Suggestion importance[1-10]: 4

__

Why: Using MAX as the pivot aggregation is a reasonable choice since data is already aggregated upstream, but it could fail for non-numeric types. The suggestion to use ANY_VALUE is valid but the improved_code references SqlStdOperatorTable.ANY_VALUE which may not exist in all Calcite versions, making this a minor concern.

Low
Mutable list field allows unintended external modification

The pivotValues list is declared as final but its contents can still be mutated if a
mutable list (e.g., ArrayList) is passed in. Since Xyseries is an AST node that
should be immutable after construction, consider wrapping the list in
Collections.unmodifiableList() or using ImmutableList.copyOf() in the constructor to
prevent external mutation.

core/src/main/java/org/opensearch/sql/ast/tree/Xyseries.java [35-36]

 /** Explicit pivot values from the IN (...) clause. */
 private final List<String> pivotValues;
 
+// In the constructor or via a defensive copy:
+// this.pivotValues = ImmutableList.copyOf(pivotValues);
+
Suggestion importance[1-10]: 3

__

Why: While the immutability concern is valid, the improved_code is essentially the same as existing_code with only a comment added, making it a low-impact suggestion. The fix would need to be applied in the constructor, not just as a comment.

Low
Suggestions up to commit 59ad4dc
CategorySuggestion                                                                                                                                    Impact
Possible issue
Remove silent exception swallowing in tests

The test silently catches ResponseException and continues to call
verifyQuery(result) on the error response body. This means if the query fails (e.g.,
because Calcite is not enabled), the test will pass as long as verifyQuery doesn't
fail on the error JSON, masking real failures. The exception should either be
rethrown or the test should assert that no exception occurred.

integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java [538-552]

-JSONObject result;
-try {
-  result =
-      executeQuery(
-          StringEscapeUtils.escapeJson(
-              String.format(
-                  "search source=%s | stats avg(balance) as avg_balance by gender, state"
-                      + " | xyseries state gender in (\"F\", \"M\") avg_balance",
-                  TEST_INDEX_BANK)));
-
-} catch (ResponseException e) {
-  result = new JSONObject(TestUtils.getResponseBody(e.getResponse()));
-}
+JSONObject result =
+    executeQuery(
+        StringEscapeUtils.escapeJson(
+            String.format(
+                "search source=%s | stats avg(balance) as avg_balance by gender, state"
+                    + " | xyseries state gender in (\"F\", \"M\") avg_balance",
+                TEST_INDEX_BANK)));
 verifyQuery(result);
Suggestion importance[1-10]: 6

__

Why: The pattern of catching ResponseException and calling verifyQuery on the error response is a real testing anti-pattern that can mask failures. The suggestion to remove the try-catch is valid and would make the tests more reliable, though the existing pattern appears intentional to handle both Calcite-enabled and non-Calcite environments.

Low
Fix hardcoded pivot column name ordering

The pivot output column name is hardcoded as pivotVal + "_" + aggName, but the
actual column name generated by Calcite's pivot() may differ (e.g., it could be
aggName_pivotVal or use a different separator). If the generated column name doesn't
match, b.field(pivotColName) will throw a runtime exception. You should verify the
exact column naming convention used by Calcite's RelBuilder pivot and align
accordingly, or look up the column by index rather than by name.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3457-3459]

 // Reference pivot output column by its generated name: {value}_{agg}
-String pivotColName = pivotVal + "_" + aggName;
+// Calcite pivot generates columns as: {agg}_{value} - verify this matches your Calcite version
+String pivotColName = aggName + "_" + pivotVal;
 reorderProjections.add(b.field(pivotColName));
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid concern about whether Calcite's pivot() generates columns as pivotVal_aggName or aggName_pivotVal. However, the integration tests in CalciteXySeriesCommandIT appear to pass with the current naming, suggesting the existing code may already be correct for the Calcite version in use. The suggestion to verify is reasonable but the proposed fix (reversing the order) may itself be wrong.

Low
General
Guard against missing child node

The getChild() method returns an empty list when child is null, but visitChildren in
the visitor may rely on having a child to traverse. If Xyseries is visited before
attach() is called (e.g., during validation or early traversal), this could silently
skip child processing. Consider adding a null-check guard or ensuring attach() is
always called before visiting.

core/src/main/java/org/opensearch/sql/ast/tree/Xyseries.java [55-58]

 @Override
 public List<UnresolvedPlan> getChild() {
-  return this.child == null ? ImmutableList.of() : ImmutableList.of(this.child);
+  if (this.child == null) {
+    throw new IllegalStateException("Xyseries node has no child attached");
+  }
+  return ImmutableList.of(this.child);
 }
Suggestion importance[1-10]: 2

__

Why: Other UnresolvedPlan nodes in the codebase (like Chart) follow the same pattern of returning an empty list when child is null, making this a consistent design choice. Throwing an exception would break the existing pattern and could cause issues during early traversal phases.

Low
Document implicit MAX aggregation assumption

Using MAX as the aggregation function for all y-data fields is a hardcoded
assumption. The xyseries command is typically applied after a stats command that has
already aggregated the data, so the pivot should use MAX (or MIN) to select the
single non-null value per group. However, if the upstream data has multiple rows per
(x-field, pivot-value) combination, silently using MAX could produce incorrect
results without any warning to the user. This assumption should at least be
documented clearly in the code.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3427-3430]

+// Use MAX to select the single non-null value per (x-field, pivot-value) group.
+// xyseries is expected to be applied after stats, so each group should have at most one row.
 List<AggCall> aggCalls =
     yDataFieldNames.stream()
         .map(name -> b.max(b.field(name)).as(name))
         .collect(Collectors.toList());
Suggestion importance[1-10]: 2

__

Why: This is purely a documentation/comment suggestion with no code change. The improved_code only adds a comment, which doesn't warrant a high score per the scoring guidelines.

Low

Signed-off-by: Asif Bashar <asif.bashar@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 59ad4dc

Signed-off-by: Asif Bashar <asif.bashar@gmail.com>
Signed-off-by: Asif Bashar <asif.bashar@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit c7b7536.

PathLineSeverityDescription
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java3430mediumUnresolved merge conflict markers (<<<<<<< HEAD / ======= / >>>>>>>) left in production code. The HEAD branch block references undefined variables (yNameAsString, yDataFieldName, projections, projectionNames) that do not exist in the surrounding scope. If this somehow bypassed compilation checks, the dead code block could produce unexpected runtime behavior. This appears to be an accidental inclusion of conflicting code rather than intentional malice, but warrants investigation to ensure the correct implementation was chosen and no unintended logic was merged.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 1 | Low: 0


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit c7b7536

Signed-off-by: Asif Bashar <asif.bashar@gmail.com>
Signed-off-by: Asif Bashar <asif.bashar@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit a72ffe1

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit a72ffe1

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 13ddcb5

Signed-off-by: Asif Bashar <asif.bashar@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 1858a98

@asifabashar asifabashar changed the title Xyseries2 Xyseries command implementation Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant