Expected Behavior
DurableLogData.attempt should provide an accurate attempt number when a step retries for any reason.
Actual Behavior
When AWS retries after any Sandbox. invocation error, DurableLogData.attempt doesn't increment. The easiest example to trigger is Sandbox.Timedout, but I’ve also seen it happen with a segmentation fault.
A quick way of triggering this is using a custom logger that separates out the method to get the log data, then exposing a method to get an attempt that can be called within a step:
import type {
DurableLogger,
DurableLoggingContext,
} from "@aws/durable-execution-sdk-js";
// eslint-disable-next-line @typescript-eslint/no-explicit-any
type LoggerParams = any;
type DurableLogLevel = Parameters<NonNullable<DurableLogger["log"]>>[0];
export class AttemptNumberLogger implements DurableLogger {
private context?: DurableLoggingContext;
get attempt() {
return this.context?.getDurableLogData().attempt ?? 1;
}
constructor(private baseLogger: DurableLogger) {}
log?(level: `${DurableLogLevel}`, ...params: LoggerParams): void {
this.baseLogger.log?.(level, ...params);
}
error(...params: LoggerParams): void {
this.baseLogger.error(...params);
}
warn(...params: LoggerParams): void {
this.baseLogger.warn(...params);
}
info(...params: LoggerParams): void {
this.baseLogger.info(...params);
}
debug(...params: LoggerParams): void {
this.baseLogger.debug(...params);
}
configureDurableLoggingContext?(
durableLoggingContext: DurableLoggingContext,
): void {
this.context = durableLoggingContext;
this.baseLogger.configureDurableLoggingContext?.(durableLoggingContext);
}
}
context.configureLogger({
customLogger: new AttemptNumberLogger(context.logger),
});
//...
await context.step(
'example',
async (context) => {
const attempt = (context.logger as AttemptNumberLogger).attempt - 1;
// attempt is always 1
while (true) {
await new Promise(resolve => setTimeout(resolve, 100));
}
},
{
retryStrategy: (error, attemptCount) => { /* not called */ }
}
);
My specific use case for this is actually getting the attempt number out of the logger, and using it as part of my own progress event reporting within the step (sort of like logging, but more user facing). Each event has a unique sequence number, which gets automatically bumped when the attempt number is greater than 1 to leave room for events that fired in the last failed attempt.
This works fine when the error is properly handled within the lambda (the attempt number increments for each retry), but for any sandbox errors the attempt number is always 1, even if it has retried the step a lot of times. This means the sequence numbers in my reported events are being duplicated.
Sort of related - I also noticed that function timeouts will retry a lot of times, and with a default backoff timing strategy. It would be nice for this to surface in the next replay and follow the retry strategy, though I understand that it might not be clear that the sandbox error happened within a particular step vs outside of it.
Steps to Reproduce
- Register custom logger to expose attempt number (code above)
- Within step, check the attempt number
- Deliberately cause
Sandbox.Timedout error, ideally without blocking the event loop as that can cause some other problems.
- Observe that attempt number from custom logger is always 1
- Optionally, change the example to throw an Error to verify that the attempt number does increase when retrying a step from a javascript error.
SDK Version
1.0.2
Node.js Version
22.x
Is this a regression?
No
Last Working Version
No response
Additional Context
No response
Expected Behavior
DurableLogData.attemptshould provide an accurate attempt number when a step retries for any reason.Actual Behavior
When AWS retries after any
Sandbox.invocation error,DurableLogData.attemptdoesn't increment. The easiest example to trigger isSandbox.Timedout, but I’ve also seen it happen with a segmentation fault.A quick way of triggering this is using a custom logger that separates out the method to get the log data, then exposing a method to get an attempt that can be called within a step:
My specific use case for this is actually getting the attempt number out of the logger, and using it as part of my own progress event reporting within the step (sort of like logging, but more user facing). Each event has a unique sequence number, which gets automatically bumped when the attempt number is greater than 1 to leave room for events that fired in the last failed attempt.
This works fine when the error is properly handled within the lambda (the attempt number increments for each retry), but for any sandbox errors the attempt number is always 1, even if it has retried the step a lot of times. This means the sequence numbers in my reported events are being duplicated.
Sort of related - I also noticed that function timeouts will retry a lot of times, and with a default backoff timing strategy. It would be nice for this to surface in the next replay and follow the retry strategy, though I understand that it might not be clear that the sandbox error happened within a particular step vs outside of it.
Steps to Reproduce
Sandbox.Timedouterror, ideally without blocking the event loop as that can cause some other problems.SDK Version
1.0.2
Node.js Version
22.x
Is this a regression?
No
Last Working Version
No response
Additional Context
No response