Skip to content

Commit 88a6fc7

Browse files
committed
docs(api): add WithTieredKV GeneratorOption reference
1 parent 5a938cf commit 88a6fc7

1 file changed

Lines changed: 34 additions & 0 deletions

File tree

content/docs/api/generate.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,40 @@ func WithGeneratorKVDtype(dtype string) GeneratorOption
215215

216216
Sets the KV cache storage dtype. Supported: `"fp32"` (default), `"fp16"`. FP16 halves KV cache memory bandwidth.
217217

218+
### func WithTieredKV
219+
220+
```go
221+
func WithTieredKV(cfg TieredKVStoreConfig) GeneratorOption
222+
```
223+
224+
Enables a three-tier KV cache with automatic promotion and demotion across hot (GPU/CPU memory), warm (compressed CPU memory), and cold (disk) storage. Layers are tracked by access count; infrequently-used layers are demoted to lower tiers while hot layers remain in uncompressed memory. An async prefetch goroutine moves cold layers back to hot before they are needed.
225+
226+
When `cfg.ColdDir` is empty, a temporary directory is created and deleted by `Close()`. When `cfg.ColdDir` is non-empty, the directory is left intact after `Close()` so cold-tier files can be reused across generation calls.
227+
228+
```go
229+
gen := generate.NewGenerator[float32](graph, tok, engine, cfg,
230+
generate.WithTieredKV(generate.TieredKVStoreConfig{
231+
ChunkSize: 64, // warm-tier compression block size
232+
DemoteThreshold: 2, // demote layers accessed < 2 times
233+
PromoteThreshold: 8, // promote layers accessed ≥ 8 times
234+
// ColdDir: "/var/cache/kv", // omit to use a temp dir
235+
}),
236+
)
237+
```
238+
239+
`TieredKVStoreConfig` fields:
240+
241+
| Field | Type | Default | Description |
242+
|-------|------|---------|-------------|
243+
| `NumLayers` | `int` | model config | Number of transformer layers |
244+
| `MaxSeqLen` | `int` | model config | Maximum sequence length |
245+
| `ChunkSize` | `int` | 64 | Compression chunk size for warm tier |
246+
| `DemoteThreshold` | `int` | 2 | Access count below which layers are demoted |
247+
| `PromoteThreshold` | `int` | 5 | Access count above which layers are promoted |
248+
| `ColdDir` | `string` | "" (temp dir) | Directory for cold-tier binary files |
249+
250+
`NumLayers` and `MaxSeqLen` are filled from the model config if left at zero.
251+
218252
### func WithMetrics
219253

220254
```go

0 commit comments

Comments
 (0)