Skip to content

[AURON #2178] [AURON #2179] Implement native support for first_value and last_value window functions#2200

Open
officialasishkumar wants to merge 1 commit intoapache:masterfrom
officialasishkumar:feat/native-window-first-last-value
Open

[AURON #2178] [AURON #2179] Implement native support for first_value and last_value window functions#2200
officialasishkumar wants to merge 1 commit intoapache:masterfrom
officialasishkumar:feat/native-window-first-last-value

Conversation

@officialasishkumar
Copy link
Copy Markdown

Which issue does this PR close?

Closes #2178
Closes #2179

Rationale for this change

Spark first_value(...) and last_value(...) are not supported in Auron's native window execution path, causing queries using them to fall back instead of being executed natively.

This PR extends native window coverage to include both functions using the existing aggregate window infrastructure.

What changes are included in this PR?

first_value:

  • adds First and First IGNORE NULLS handling to NativeWindowBase
  • maps to the existing AggFunction::First / AggFunction::FirstIgnoresNull Rust aggregates via AggProcessor
  • the running-aggregate semantics of ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW naturally produce correct first-value behavior through the existing AggProcessor
  • no new Rust aggregate code is needed for first_value

last_value:

  • introduces AggFunction::Last and AggFunction::LastIgnoresNull to the Rust aggregate infrastructure
  • adds AggLast (always updates accumulator, including with null values) and AggLastIgnoresNull (updates accumulator only for non-null values)
  • extends the protobuf AggFunction enum with LAST = 10 and LAST_IGNORES_NULL = 11
  • adds planner and lib.rs mappings for the new proto values
  • adds Last and Last IGNORE NULLS handling to NativeWindowBase

Additional changes:

  • adds Last import and conversion case to NativeConverters.scala so last() as a group aggregate also works natively
  • adds Scala regression tests for first_value and last_value (both RESPECT NULLS and IGNORE NULLS variants)

Both functions use ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, consistent with all other aggregate window functions in Auron.

Are there any user-facing changes?

Yes.
Queries using first_value(...) and last_value(...) can now remain on Auron's native window execution path.
Both RESPECT NULLS (default) and IGNORE NULLS variants are supported.

How was this patch tested?

CI.

…first_value and last_value window functions

Spark `first_value(...)` and `last_value(...)` are not supported in
Auron's native window execution path, causing queries using them to fall
back to the Spark path instead of being executed natively.

This PR extends native window coverage to include both functions:

first_value:
- maps `First(child, ignoreNulls)` in the window expression to the
  existing `AggFunction::First` / `AggFunction::FirstIgnoresNull` Rust
  aggregates via `NativeWindowBase`
- the running-aggregate semantics of `ROWS BETWEEN UNBOUNDED PRECEDING
  AND CURRENT ROW` naturally produce the correct first-value behavior
  through the existing `AggProcessor`
- no new Rust aggregate code is required for first_value

last_value:
- introduces `AggFunction::Last` and `AggFunction::LastIgnoresNull`
  to the Rust aggregate infrastructure
- adds `AggLast` (always updates accumulator, including with null values)
  and `AggLastIgnoresNull` (updates accumulator only for non-null values)
- extends the protobuf `AggFunction` enum with `LAST = 10` and
  `LAST_IGNORES_NULL = 11`
- adds planner and lib.rs mappings for the new proto values
- maps `Last(child, ignoreNulls)` in the window expression to the new
  `AggFunction::Last` / `AggFunction::LastIgnoresNull` aggregates

Both functions use the existing `AggProcessor` with frame
`ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`, consistent with
all other aggregate window functions in Auron.

Additional changes:
- adds `Last` import and conversion case to `NativeConverters.scala`
  so `last()` as a group aggregate also works natively
- adds `First` and `Last` imports and match cases to `NativeWindowBase`
- adds Scala regression tests for first_value and last_value, covering
  both RESPECT NULLS and IGNORE NULLS variants

Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds native execution support for Spark first_value / last_value window functions (including IGNORE NULLS) by wiring them through Auron’s existing aggregate-window infrastructure and extending the native aggregate set.

Changes:

  • Extend Spark-side window/aggregate conversion to emit FIRST(_IGNORES_NULL) and new LAST(_IGNORES_NULL) agg functions.
  • Add native-engine Rust implementations for Last and LastIgnoresNull, plus factory/planner/proto enum wiring.
  • Add Scala regression tests covering first_value / last_value with RESPECT/IGNORE NULLS.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeWindowBase.scala Adds window-function mapping for First/Last aggregates to protobuf agg functions.
spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeConverters.scala Adds group-aggregate conversion support for Last / Last IGNORE NULLS.
spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronWindowSuite.scala Adds regression coverage for first_value / last_value (incl. IGNORE NULLS).
native-engine/datafusion-ext-plans/src/agg/mod.rs Registers new Last / LastIgnoresNull modules and enum variants.
native-engine/datafusion-ext-plans/src/agg/last.rs Implements AggLast (RESPECT NULLS) aggregate behavior.
native-engine/datafusion-ext-plans/src/agg/last_ignores_null.rs Implements AggLastIgnoresNull aggregate behavior.
native-engine/datafusion-ext-plans/src/agg/agg.rs Wires new agg functions into the aggregate factory.
native-engine/auron-planner/src/planner.rs Maps protobuf window agg functions to native AggFunction::Last*.
native-engine/auron-planner/src/lib.rs Adds protobuf-to-native enum conversion for Last*.
native-engine/auron-planner/proto/auron.proto Extends protobuf AggFunction enum with LAST and LAST_IGNORES_NULL.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

package org.apache.auron

import org.apache.spark.sql.AuronQueryTest
import org.apache.spark.sql.execution.auron.plan.NativeWindowBase
Comment on lines +152 to +160
let accs = downcast_any!(accs, mut AccScalarValueColumn)?;
idx_for_zipped! {
((acc_idx, partial_arg_idx) in (acc_idx, partial_arg_idx)) => {
if partial_arg.is_valid(partial_arg_idx) {
accs.set_value(acc_idx, compacted_scalar_value_from_array(partial_arg, partial_arg_idx)?);
} else {
accs.set_value(acc_idx, ScalarValue::try_from(&self.data_type)?);
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement native support for last_value window function Implement native support for first_value window function

2 participants