Skip to content

Align TraceIdRatioBasedSampler with Java/Python implementations for cross-language compatibility #6503

@Shrey93

Description

@Shrey93

Is your feature request related to a problem? Please describe.

The current TraceIdRatioBasedSampler implementation in opentelemetry-js is incompatible with the Java and Python implementations, causing inconsistent sampling decisions across polyglot distributed systems. When trace context propagates between services via HTTP headers, all services receive the same trace ID but make different sampling decisions due to divergent hash algorithms. This breaks end-to-end trace visibility and creates gaps in distributed traces.

The JavaScript implementation XORs all 8-character chunks across the entire trace ID to produce a 32-bit hash, while both Java and Python implementations extract only the lower 64 bits of the trace ID. For the same trace ID and sampling ratio, JavaScript produces different sampling decisions than Java/Python, resulting in incomplete traces where some service spans are recorded while others are dropped.

This issue is particularly problematic when services need independent sampling control. Parent-based sampling would force downstream services to inherit upstream sampling decisions, preventing scenarios where Service A reduces sampling to zero during incidents while Service B maintains normal sampling rates for its own observability needs.

Describe the solution you'd like

Align NodeJS TraceIdRatioBasedSampler implementation with the Java and Python specifications by:

  • Extracting the lower 64 bits of the trace ID (last 16 hex characters) using traceId.slice(16, 32)
  • Converting to a signed 64-bit integer equivalent using BigInt
  • Calculating the boundary as ratio × (2^63 - 1), with special cases for ratio 0.0 (use minimum safe integer) and ratio 1.0 (use maximum safe integer)
  • Comparing the absolute value against the boundary using strict less-than

This ensures deterministic, cross-language compatible sampling decisions while preserving independent sampling control per service.

Describe alternatives you've considered

Alternative 1: Standardize all language implementations to use the JavaScript XOR approach. This would require coordinated changes across multiple SDKs (Java, Python, and others) and break existing deployments across the ecosystem.

Alternative 2: Use parent-based sampling exclusively. This forces downstream services to inherit upstream sampling decisions, eliminating the ability to adjust sampling rates independently per service. This approach fails when upstream services need to disable tracing during incidents while downstream services require continued observability.

Alternative 3: Implement custom sampling logic outside OpenTelemetry. This creates maintenance burden, loses community support, and fragments the observability ecosystem.

Additional context

The OpenTelemetry specification does not mandate a specific hash algorithm for TraceIdRatioBasedSampler, resulting in implementation divergence across language SDKs. Cross-language consistency is critical for polyglot microservice architectures where trace context propagates across service boundaries.

Both Java and Python implementations use the lower 64-bit extraction approach, making JavaScript the outlier. Aligning Node.js with this established pattern would improve interoperability for distributed systems using multiple language SDKs.

Reference implementations:

Why breaking backward compatibility is acceptable

This change will alter sampling decisions for existing JavaScript deployments, but the benefits outweigh the costs:

Cross-language consistency is a correctness issue, not a feature enhancement. The current implementation produces incorrect behavior in polyglot systems where trace context crosses service boundaries. Traces are silently fragmented, making distributed debugging difficult.

The impact is limited to sampling decisions, not trace functionality. Existing instrumentation code requires no changes. Only the statistical distribution of which traces are sampled will shift. No data loss occurs since unsampled traces were already being dropped.

Sampling ratios are probabilistic by design. Users already accept that exact trace counts vary based on traffic patterns and trace ID distribution. A one-time shift in which specific trace IDs are sampled is consistent with the probabilistic nature of sampling.

The alternative is permanent fragmentation. Without this change, JavaScript will remain incompatible with the broader OpenTelemetry ecosystem indefinitely, forcing users to choose between JavaScript and other languages for distributed tracing.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions