Skip to content

Commit bf640ad

Browse files
committed
docs(adr): add API stability contract for float8 v1.0.0
1 parent da7e1cc commit bf640ad

File tree

1 file changed

+93
-0
lines changed

1 file changed

+93
-0
lines changed

docs/adr/001-api-stability-v1.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# ADR-001: API Stability Contract for float8 v1.0.0
2+
3+
- **Status:** Accepted
4+
- **Date:** 2026-03-29
5+
- **Authors:** Daniel Ndungu
6+
7+
## Context
8+
9+
The `github.com/zerfoo/float8` package provides IEEE 754 FP8 E4M3FN arithmetic for the Zerfoo ML ecosystem. It is imported by `ztensor` for quantized tensor storage and compute. The package has reached a stable API surface and needs a clear stability contract so downstream consumers can depend on it without fear of breakage.
10+
11+
## Decision
12+
13+
### Stable (v1 guarantee)
14+
15+
The following API surface is covered by Go module compatibility and will not have breaking changes within the v1.x line:
16+
17+
**Core type:**
18+
- `Float8` (defined as `uint8`)
19+
20+
**Constructors and conversions:**
21+
- `ToFloat8(float32) Float8`
22+
- `ToFloat8WithMode(float32, ConversionMode) (Float8, error)`
23+
- `FromFloat64(float64) Float8`
24+
- `FromInt(int) Float8`
25+
- `FromBits(uint8) Float8`
26+
- `Parse(string) (Float8, error)`
27+
- `Zero() Float8`, `One() Float8`
28+
29+
**Methods on Float8:**
30+
- `ToFloat32() float32`, `ToFloat64() float64`, `ToInt() int`
31+
- `Bits() uint8`
32+
- `Abs() Float8`, `Neg() Float8`
33+
- `Sign() int`
34+
- `IsZero() bool`, `IsNaN() bool`, `IsInf() bool`, `IsFinite() bool`, `IsNormal() bool`, `IsValid() bool`
35+
- `String() string`, `GoString() string`
36+
37+
**Arithmetic functions:**
38+
- `Add`, `Sub`, `Mul`, `Div` (and `*WithMode` variants)
39+
- `AddSlice`, `MulSlice`, `ScaleSlice`, `SumSlice`
40+
41+
**Math functions:**
42+
- `Sqrt`, `Pow`, `Exp`, `Log`
43+
- `Sin`, `Cos`, `Tan`
44+
- `Floor`, `Ceil`, `Round`, `Trunc`, `Fmod`
45+
- `Min`, `Max`, `Clamp`, `Lerp`, `Sign`, `CopySign`
46+
47+
**Comparison functions:**
48+
- `Equal`, `Less`, `LessEqual`, `Greater`, `GreaterEqual`
49+
50+
**Batch conversions:**
51+
- `ToSlice8([]float32) []Float8`
52+
- `ToSlice32([]Float8) []float32`
53+
54+
**Configuration:**
55+
- `Config`, `DefaultConfig`, `Configure`
56+
- `ConversionMode` (constants: `ModeDefault`, `ModeStrict`, `ModeFast`)
57+
- `ArithmeticMode` (constants: `ArithmeticAuto`, `ArithmeticAlgorithmic`, `ArithmeticLookup`)
58+
- `EnableFastConversion`, `DisableFastConversion`
59+
- `EnableFastArithmetic`, `DisableFastArithmetic`
60+
- `DefaultConversionMode`, `DefaultArithmeticMode` (package-level variables)
61+
62+
**Constants:**
63+
- Bit masks: `SignMask`, `ExponentMask`, `MantissaMask`, `MantissaLen`
64+
- Exponent: `ExponentBias`, `ExponentMax`, `ExponentMin`, `Float32Bias`
65+
- Special values: `PositiveZero`, `NegativeZero`, `PositiveInfinity`, `NegativeInfinity`, `NaN`, `MaxValue`, `MinValue`, `SmallestPositive`
66+
- Math constants: `E`, `Pi`, `Phi`, `Sqrt2`, `SqrtE`, `SqrtPi`, `Ln2`, `Log2E`, `Ln10`, `Log10E`
67+
68+
**Error types:**
69+
- `Float8Error` (struct with `Op`, `Value`, `Msg` fields)
70+
- Sentinel errors: `ErrOverflow`, `ErrUnderflow`, `ErrNaN`
71+
72+
**Utilities:**
73+
- `Initialize()`, `GetVersion()`, `GetMemoryUsage()`, `DebugInfo()`
74+
75+
### Explicitly deferred
76+
77+
The following are **not** part of v1 and are candidates for v1.1+:
78+
79+
- **FP8 E5M2 format** — A second 8-bit format with 5 exponent bits and 2 mantissa bits, used in some gradient representations. Will be added as a separate type (e.g., `Float8E5M2`) without altering the existing `Float8` (E4M3FN) type.
80+
- **SIMD-accelerated batch operations** — Platform-specific vectorized paths for slice operations.
81+
- **Stochastic rounding mode** — A `ConversionMode` variant that uses probabilistic rounding for training workloads.
82+
83+
### Versioning policy
84+
85+
- Patch releases (v1.0.x): bug fixes, performance improvements, documentation.
86+
- Minor releases (v1.x.0): new functions, types, or constants that do not break existing callers.
87+
- The `Version` constant tracks the current release and is updated by release-please.
88+
89+
## Consequences
90+
91+
- Downstream packages (`ztensor`, `zerfoo`) can pin `float8 v1.x` and upgrade freely within the major version.
92+
- New FP8 formats (E5M2) will be additive and will not modify the `Float8` type or its semantics.
93+
- Any behavioral change to existing functions (e.g., rounding rules, special-value handling) requires a new major version.

0 commit comments

Comments
 (0)