Skip to content

Commit e1a539a

Browse files
committed
docs: fix empty section pages, add Granite TS docs, update Mistral benchmark
- Add body content to all section _index.md files so sidebar links render a proper page instead of a blank one - Update Mistral 7B benchmark from 11 to 44 tok/s (0.94x Ollama) - Add Granite Time Series reference page (TTM, FlowState, TSPulse) - Add Granite TS link to Reference section index
1 parent 888defc commit e1a539a

File tree

12 files changed

+212
-2
lines changed

12 files changed

+212
-2
lines changed

content/docs/api/_index.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,11 @@ title: API Reference
33
weight: 3
44
bookCollapseSection: true
55
---
6+
7+
# API Reference
8+
9+
Detailed documentation for the public Go APIs.
10+
11+
- [Generate]({{< relref "generate" >}}) -- text generation, streaming, and sampling options
12+
- [Inference]({{< relref "inference" >}}) -- model loading, GGUF parsing, architecture builders
13+
- [Serve]({{< relref "serve" >}}) -- OpenAI-compatible HTTP server and middleware

content/docs/architecture/_index.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,10 @@ title: Architecture
33
weight: 5
44
bookCollapseSection: true
55
---
6+
7+
# Architecture
8+
9+
How Zerfoo works under the hood.
10+
11+
- [Overview]({{< relref "overview" >}}) -- inference pipeline, package layout, and design decisions
12+
- [GPU Setup]({{< relref "gpu-setup" >}}) -- CUDA, ROCm, and OpenCL configuration

content/docs/blog/_index.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,7 @@ title: Blog
33
weight: 11
44
bookCollapseSection: true
55
---
6+
7+
# Blog
8+
9+
Development updates and technical deep dives.

content/docs/contributing/_index.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,9 @@ title: Contributing
33
weight: 9
44
bookCollapseSection: true
55
---
6+
7+
# Contributing
8+
9+
How to contribute to the Zerfoo ecosystem.
10+
11+
- [Overview]({{< relref "overview" >}}) -- code style, PR process, and testing requirements

content/docs/cookbooks/_index.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,20 @@ title: Cookbooks
33
weight: 4
44
bookCollapseSection: true
55
---
6+
7+
# Cookbooks
8+
9+
Ready-to-use code recipes. Each cookbook is a self-contained example you can copy into your project.
10+
11+
- [Basic Text Generation]({{< relref "basic-text-generation" >}})
12+
- [Streaming Chat]({{< relref "streaming-chat" >}})
13+
- [Structured JSON Output]({{< relref "structured-json-output" >}})
14+
- [Embedding Similarity]({{< relref "embedding-similarity" >}})
15+
- [Tool Calling]({{< relref "tool-calling" >}})
16+
- [RAG]({{< relref "rag" >}})
17+
- [LoRA Fine-Tuning]({{< relref "lora-fine-tuning" >}})
18+
- [Vision / Multimodal]({{< relref "vision-multimodal" >}})
19+
- [OpenAI Server]({{< relref "openai-server" >}})
20+
- [Batch Inference]({{< relref "batch-inference" >}})
21+
- [Speculative Decoding]({{< relref "speculative-decoding" >}})
22+
- [Custom Sampling]({{< relref "custom-sampling" >}})

content/docs/deployment/_index.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,10 @@ title: Deployment
33
weight: 6
44
bookCollapseSection: true
55
---
6+
7+
# Deployment
8+
9+
Guides for running Zerfoo in production environments.
10+
11+
- [Production]({{< relref "production" >}}) -- configuration, monitoring, and scaling
12+
- [Enterprise]({{< relref "enterprise" >}}) -- multi-tenant, TLS/mTLS, and compliance

content/docs/getting-started/_index.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,11 @@ title: Getting Started
33
weight: 1
44
bookCollapseSection: true
55
---
6+
7+
# Getting Started
8+
9+
Install Zerfoo, pull a model, and run your first inference in minutes.
10+
11+
- [Installation]({{< relref "installation" >}}) -- install the Go module and CLI
12+
- [Quick Start]({{< relref "quickstart" >}}) -- generate text with three lines of Go
13+
- [First Inference]({{< relref "first-inference" >}}) -- a guided walkthrough of the inference pipeline

content/docs/reference/_index.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,13 @@ title: Reference
33
weight: 10
44
bookCollapseSection: true
55
---
6+
7+
# Reference
8+
9+
Benchmarks, API stability guarantees, and migration guides.
10+
11+
- [Benchmarks]({{< relref "benchmarks" >}}) -- throughput numbers and comparison methodology
12+
- [Granite Time Series]({{< relref "granite-timeseries" >}}) -- IBM Granite TTM, FlowState, and TSPulse models
13+
- [API Stability]({{< relref "api-stability" >}}) -- versioning and compatibility guarantees
14+
- [Extensions]({{< relref "extensions" >}}) -- plugin and extension points
15+
- [Migration to v1]({{< relref "migration-v1" >}}) -- upgrading from pre-v1 releases

content/docs/reference/benchmarks.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,10 +45,11 @@ Ollama v0.17.7.
4545
| Gemma 3 1B Q4_K_M | gemma3 | 1B | **241** (256 tok) | 201 (256 tok) | **1.20x** | Zerfoo |
4646
| DeepSeek R1 1.5B Q4_K_M | deepseek2 | 1.5B | **192.83** | 184.75 | **1.04x** | Zerfoo |
4747
| Llama 3.2 3B Q4_K_M | llama | 3B | 96.06 | 97.66 | 0.98x | ~Even |
48-
| Mistral 7B Q4_K_M | mistral | 7B | 11.61 | 46.77 | 0.25x | Ollama |
48+
| Mistral 7B Q4_K_M | mistral | 7B | **44** | 46.77 | **0.94x** | ~Even |
4949

5050
Zerfoo wins on small models (1B-1.5B). Llama 3.2 3B is at parity. Mistral 7B
51-
has a known performance regression ([investigation pending](https://github.com/zerfoo/zerfoo/issues)).
51+
was previously at 11 tok/s due to a performance regression; after the fix it
52+
runs at 44 tok/s (0.94x Ollama -- near parity).
5253
Additional architectures (Qwen, Phi, Mixtral, Command-R, Falcon, Mamba, RWKV)
5354
will be added as GGUF files are acquired and parser compatibility is resolved.
5455

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
---
2+
title: "Granite Time Series"
3+
weight: 2
4+
bookToc: true
5+
---
6+
7+
# Granite Time Series
8+
9+
Zerfoo supports IBM Granite Time Series foundation models for time-series
10+
inference. Three model families are available, each targeting different tasks.
11+
12+
## Model Families
13+
14+
| Model | Parameters | Tasks | Key Feature |
15+
|-------|-----------|-------|-------------|
16+
| **Granite TTM** | 1M-5M | Forecasting | Zero-shot and few-shot forecasting with channel mixing |
17+
| **Granite FlowState** | 2M-10M | Forecasting | Continuous forecasting across arbitrary timescales |
18+
| **Granite TSPulse** | 1M-5M | Anomaly detection, classification, imputation, embedding | Lightweight encoder for multi-task time-series analysis |
19+
20+
## Supported Tasks
21+
22+
- **Forecasting** -- predict future values from historical context (TTM, FlowState)
23+
- **Anomaly detection** -- identify outliers and anomalous patterns (TSPulse)
24+
- **Classification** -- classify time-series segments (TSPulse)
25+
- **Imputation** -- fill in missing values (TSPulse)
26+
- **Embedding** -- extract fixed-size representations for downstream tasks (TSPulse)
27+
28+
## Converting Models
29+
30+
Granite Time Series models are published on HuggingFace in SafeTensors format.
31+
Use the `granite2gguf` converter (part of `zonnx`) to produce GGUF files:
32+
33+
```bash
34+
go install github.com/zerfoo/zonnx/cmd/granite2gguf@latest
35+
36+
# Convert a TTM model
37+
granite2gguf \
38+
--model ibm-granite/granite-timeseries-ttm-r2 \
39+
--output granite-ttm-r2.gguf
40+
41+
# Convert a FlowState model
42+
granite2gguf \
43+
--model ibm-granite/granite-timeseries-flowstate \
44+
--output granite-flowstate.gguf
45+
46+
# Convert a TSPulse model
47+
granite2gguf \
48+
--model ibm-granite/granite-timeseries-tspulse \
49+
--output granite-tspulse.gguf
50+
```
51+
52+
The converter downloads weights from HuggingFace, maps the architecture to GGUF
53+
tensor names, and writes a self-contained `.gguf` file.
54+
55+
## Running Inference
56+
57+
### Forecasting (TTM)
58+
59+
```go
60+
import "github.com/zerfoo/zerfoo/inference/timeseries"
61+
62+
model, err := timeseries.LoadGGUF("granite-ttm-r2.gguf", engine)
63+
if err != nil {
64+
log.Fatal(err)
65+
}
66+
defer model.Close()
67+
68+
// Input: [batch, channels, context_length]
69+
// Output: [batch, channels, forecast_length]
70+
input := tensor.New[float32](engine, []int{1, 3, 512})
71+
// ... fill input with historical data ...
72+
73+
forecast, err := model.Forecast(ctx, input)
74+
if err != nil {
75+
log.Fatal(err)
76+
}
77+
fmt.Println("forecast shape:", forecast.Shape())
78+
```
79+
80+
### Anomaly Detection (TSPulse)
81+
82+
```go
83+
model, err := timeseries.LoadGGUF("granite-tspulse.gguf", engine)
84+
if err != nil {
85+
log.Fatal(err)
86+
}
87+
defer model.Close()
88+
89+
scores, err := model.DetectAnomalies(ctx, input)
90+
if err != nil {
91+
log.Fatal(err)
92+
}
93+
// scores: per-timestep anomaly scores
94+
```
95+
96+
### Embedding Extraction (TSPulse)
97+
98+
```go
99+
embeddings, err := model.Embed(ctx, input)
100+
if err != nil {
101+
log.Fatal(err)
102+
}
103+
// embeddings: [batch, embed_dim] fixed-size representations
104+
```
105+
106+
## Architecture Details
107+
108+
All three model families use a patch-based transformer encoder architecture:
109+
110+
1. **Patching** -- the input time series is segmented into fixed-size patches
111+
2. **Channel mixing** -- multivariate channels are projected into a shared space
112+
3. **Transformer encoder** -- standard multi-head self-attention over patches
113+
4. **Task head** -- a linear projection head specific to the task (forecast, classify, reconstruct, embed)
114+
115+
GGUF metadata stores the model family (`granite-ttm`, `granite-flowstate`,
116+
`granite-tspulse`), context length, forecast length, patch size, and number of
117+
channels. The inference runtime auto-configures based on these fields.
118+
119+
## Model Sources
120+
121+
| Model | HuggingFace Repo |
122+
|-------|-----------------|
123+
| Granite TTM R2 | [ibm-granite/granite-timeseries-ttm-r2](https://huggingface.co/ibm-granite/granite-timeseries-ttm-r2) |
124+
| Granite FlowState | [ibm-granite/granite-timeseries-flowstate](https://huggingface.co/ibm-granite/granite-timeseries-flowstate) |
125+
| Granite TSPulse | [ibm-granite/granite-timeseries-tspulse](https://huggingface.co/ibm-granite/granite-timeseries-tspulse) |

0 commit comments

Comments
 (0)