|
| 1 | +# ADR-001: API Stability Contract for float8 v1.0.0 |
| 2 | + |
| 3 | +- **Status:** Accepted |
| 4 | +- **Date:** 2026-03-29 |
| 5 | +- **Authors:** Daniel Ndungu |
| 6 | + |
| 7 | +## Context |
| 8 | + |
| 9 | +The `github.com/zerfoo/float8` package provides IEEE 754 FP8 E4M3FN arithmetic for the Zerfoo ML ecosystem. It is imported by `ztensor` for quantized tensor storage and compute. The package has reached a stable API surface and needs a clear stability contract so downstream consumers can depend on it without fear of breakage. |
| 10 | + |
| 11 | +## Decision |
| 12 | + |
| 13 | +### Stable (v1 guarantee) |
| 14 | + |
| 15 | +The following API surface is covered by Go module compatibility and will not have breaking changes within the v1.x line: |
| 16 | + |
| 17 | +**Core type:** |
| 18 | +- `Float8` (defined as `uint8`) |
| 19 | + |
| 20 | +**Constructors and conversions:** |
| 21 | +- `ToFloat8(float32) Float8` |
| 22 | +- `ToFloat8WithMode(float32, ConversionMode) (Float8, error)` |
| 23 | +- `FromFloat64(float64) Float8` |
| 24 | +- `FromInt(int) Float8` |
| 25 | +- `FromBits(uint8) Float8` |
| 26 | +- `Parse(string) (Float8, error)` |
| 27 | +- `Zero() Float8`, `One() Float8` |
| 28 | + |
| 29 | +**Methods on Float8:** |
| 30 | +- `ToFloat32() float32`, `ToFloat64() float64`, `ToInt() int` |
| 31 | +- `Bits() uint8` |
| 32 | +- `Abs() Float8`, `Neg() Float8` |
| 33 | +- `Sign() int` |
| 34 | +- `IsZero() bool`, `IsNaN() bool`, `IsInf() bool`, `IsFinite() bool`, `IsNormal() bool`, `IsValid() bool` |
| 35 | +- `String() string`, `GoString() string` |
| 36 | + |
| 37 | +**Arithmetic functions:** |
| 38 | +- `Add`, `Sub`, `Mul`, `Div` (and `*WithMode` variants) |
| 39 | +- `AddSlice`, `MulSlice`, `ScaleSlice`, `SumSlice` |
| 40 | + |
| 41 | +**Math functions:** |
| 42 | +- `Sqrt`, `Pow`, `Exp`, `Log` |
| 43 | +- `Sin`, `Cos`, `Tan` |
| 44 | +- `Floor`, `Ceil`, `Round`, `Trunc`, `Fmod` |
| 45 | +- `Min`, `Max`, `Clamp`, `Lerp`, `Sign`, `CopySign` |
| 46 | + |
| 47 | +**Comparison functions:** |
| 48 | +- `Equal`, `Less`, `LessEqual`, `Greater`, `GreaterEqual` |
| 49 | + |
| 50 | +**Batch conversions:** |
| 51 | +- `ToSlice8([]float32) []Float8` |
| 52 | +- `ToSlice32([]Float8) []float32` |
| 53 | + |
| 54 | +**Configuration:** |
| 55 | +- `Config`, `DefaultConfig`, `Configure` |
| 56 | +- `ConversionMode` (constants: `ModeDefault`, `ModeStrict`, `ModeFast`) |
| 57 | +- `ArithmeticMode` (constants: `ArithmeticAuto`, `ArithmeticAlgorithmic`, `ArithmeticLookup`) |
| 58 | +- `EnableFastConversion`, `DisableFastConversion` |
| 59 | +- `EnableFastArithmetic`, `DisableFastArithmetic` |
| 60 | +- `DefaultConversionMode`, `DefaultArithmeticMode` (package-level variables) |
| 61 | + |
| 62 | +**Constants:** |
| 63 | +- Bit masks: `SignMask`, `ExponentMask`, `MantissaMask`, `MantissaLen` |
| 64 | +- Exponent: `ExponentBias`, `ExponentMax`, `ExponentMin`, `Float32Bias` |
| 65 | +- Special values: `PositiveZero`, `NegativeZero`, `PositiveInfinity`, `NegativeInfinity`, `NaN`, `MaxValue`, `MinValue`, `SmallestPositive` |
| 66 | +- Math constants: `E`, `Pi`, `Phi`, `Sqrt2`, `SqrtE`, `SqrtPi`, `Ln2`, `Log2E`, `Ln10`, `Log10E` |
| 67 | + |
| 68 | +**Error types:** |
| 69 | +- `Float8Error` (struct with `Op`, `Value`, `Msg` fields) |
| 70 | +- Sentinel errors: `ErrOverflow`, `ErrUnderflow`, `ErrNaN` |
| 71 | + |
| 72 | +**Utilities:** |
| 73 | +- `Initialize()`, `GetVersion()`, `GetMemoryUsage()`, `DebugInfo()` |
| 74 | + |
| 75 | +### Explicitly deferred |
| 76 | + |
| 77 | +The following are **not** part of v1 and are candidates for v1.1+: |
| 78 | + |
| 79 | +- **FP8 E5M2 format** — A second 8-bit format with 5 exponent bits and 2 mantissa bits, used in some gradient representations. Will be added as a separate type (e.g., `Float8E5M2`) without altering the existing `Float8` (E4M3FN) type. |
| 80 | +- **SIMD-accelerated batch operations** — Platform-specific vectorized paths for slice operations. |
| 81 | +- **Stochastic rounding mode** — A `ConversionMode` variant that uses probabilistic rounding for training workloads. |
| 82 | + |
| 83 | +### Versioning policy |
| 84 | + |
| 85 | +- Patch releases (v1.0.x): bug fixes, performance improvements, documentation. |
| 86 | +- Minor releases (v1.x.0): new functions, types, or constants that do not break existing callers. |
| 87 | +- The `Version` constant tracks the current release and is updated by release-please. |
| 88 | + |
| 89 | +## Consequences |
| 90 | + |
| 91 | +- Downstream packages (`ztensor`, `zerfoo`) can pin `float8 v1.x` and upgrade freely within the major version. |
| 92 | +- New FP8 formats (E5M2) will be additive and will not modify the `Float8` type or its semantics. |
| 93 | +- Any behavioral change to existing functions (e.g., rounding rules, special-value handling) requires a new major version. |
0 commit comments