[AURON #2178] [AURON #2179] Implement native support for first_value and last_value window functions#2200
Open
officialasishkumar wants to merge 1 commit intoapache:masterfrom
Conversation
…first_value and last_value window functions Spark `first_value(...)` and `last_value(...)` are not supported in Auron's native window execution path, causing queries using them to fall back to the Spark path instead of being executed natively. This PR extends native window coverage to include both functions: first_value: - maps `First(child, ignoreNulls)` in the window expression to the existing `AggFunction::First` / `AggFunction::FirstIgnoresNull` Rust aggregates via `NativeWindowBase` - the running-aggregate semantics of `ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW` naturally produce the correct first-value behavior through the existing `AggProcessor` - no new Rust aggregate code is required for first_value last_value: - introduces `AggFunction::Last` and `AggFunction::LastIgnoresNull` to the Rust aggregate infrastructure - adds `AggLast` (always updates accumulator, including with null values) and `AggLastIgnoresNull` (updates accumulator only for non-null values) - extends the protobuf `AggFunction` enum with `LAST = 10` and `LAST_IGNORES_NULL = 11` - adds planner and lib.rs mappings for the new proto values - maps `Last(child, ignoreNulls)` in the window expression to the new `AggFunction::Last` / `AggFunction::LastIgnoresNull` aggregates Both functions use the existing `AggProcessor` with frame `ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`, consistent with all other aggregate window functions in Auron. Additional changes: - adds `Last` import and conversion case to `NativeConverters.scala` so `last()` as a group aggregate also works natively - adds `First` and `Last` imports and match cases to `NativeWindowBase` - adds Scala regression tests for first_value and last_value, covering both RESPECT NULLS and IGNORE NULLS variants Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds native execution support for Spark first_value / last_value window functions (including IGNORE NULLS) by wiring them through Auron’s existing aggregate-window infrastructure and extending the native aggregate set.
Changes:
- Extend Spark-side window/aggregate conversion to emit
FIRST(_IGNORES_NULL)and newLAST(_IGNORES_NULL)agg functions. - Add native-engine Rust implementations for
LastandLastIgnoresNull, plus factory/planner/proto enum wiring. - Add Scala regression tests covering
first_value/last_valuewith RESPECT/IGNORE NULLS.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeWindowBase.scala | Adds window-function mapping for First/Last aggregates to protobuf agg functions. |
| spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeConverters.scala | Adds group-aggregate conversion support for Last / Last IGNORE NULLS. |
| spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronWindowSuite.scala | Adds regression coverage for first_value / last_value (incl. IGNORE NULLS). |
| native-engine/datafusion-ext-plans/src/agg/mod.rs | Registers new Last / LastIgnoresNull modules and enum variants. |
| native-engine/datafusion-ext-plans/src/agg/last.rs | Implements AggLast (RESPECT NULLS) aggregate behavior. |
| native-engine/datafusion-ext-plans/src/agg/last_ignores_null.rs | Implements AggLastIgnoresNull aggregate behavior. |
| native-engine/datafusion-ext-plans/src/agg/agg.rs | Wires new agg functions into the aggregate factory. |
| native-engine/auron-planner/src/planner.rs | Maps protobuf window agg functions to native AggFunction::Last*. |
| native-engine/auron-planner/src/lib.rs | Adds protobuf-to-native enum conversion for Last*. |
| native-engine/auron-planner/proto/auron.proto | Extends protobuf AggFunction enum with LAST and LAST_IGNORES_NULL. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| package org.apache.auron | ||
|
|
||
| import org.apache.spark.sql.AuronQueryTest | ||
| import org.apache.spark.sql.execution.auron.plan.NativeWindowBase |
Comment on lines
+152
to
+160
| let accs = downcast_any!(accs, mut AccScalarValueColumn)?; | ||
| idx_for_zipped! { | ||
| ((acc_idx, partial_arg_idx) in (acc_idx, partial_arg_idx)) => { | ||
| if partial_arg.is_valid(partial_arg_idx) { | ||
| accs.set_value(acc_idx, compacted_scalar_value_from_array(partial_arg, partial_arg_idx)?); | ||
| } else { | ||
| accs.set_value(acc_idx, ScalarValue::try_from(&self.data_type)?); | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #2178
Closes #2179
Rationale for this change
Spark
first_value(...)andlast_value(...)are not supported in Auron's native window execution path, causing queries using them to fall back instead of being executed natively.This PR extends native window coverage to include both functions using the existing aggregate window infrastructure.
What changes are included in this PR?
first_value:
FirstandFirst IGNORE NULLShandling toNativeWindowBaseAggFunction::First/AggFunction::FirstIgnoresNullRust aggregates viaAggProcessorROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWnaturally produce correct first-value behavior through the existingAggProcessorfirst_valuelast_value:
AggFunction::LastandAggFunction::LastIgnoresNullto the Rust aggregate infrastructureAggLast(always updates accumulator, including with null values) andAggLastIgnoresNull(updates accumulator only for non-null values)AggFunctionenum withLAST = 10andLAST_IGNORES_NULL = 11LastandLast IGNORE NULLShandling toNativeWindowBaseAdditional changes:
Lastimport and conversion case toNativeConverters.scalasolast()as a group aggregate also works nativelyfirst_valueandlast_value(both RESPECT NULLS and IGNORE NULLS variants)Both functions use
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, consistent with all other aggregate window functions in Auron.Are there any user-facing changes?
Yes.
Queries using
first_value(...)andlast_value(...)can now remain on Auron's native window execution path.Both RESPECT NULLS (default) and IGNORE NULLS variants are supported.
How was this patch tested?
CI.