docs: fix empty section pages, add Granite TS docs, update Mistral benchmark

dndungu · dndungu · commit e1a539a8201c · 2026-03-25T22:42:57.000-07:00
- Add body content to all section _index.md files so sidebar links
  render a proper page instead of a blank one
- Update Mistral 7B benchmark from 11 to 44 tok/s (0.94x Ollama)
- Add Granite Time Series reference page (TTM, FlowState, TSPulse)
- Add Granite TS link to Reference section index
diff --git a/content/docs/api/_index.md b/content/docs/api/_index.md
@@ -3,3 +3,11 @@ title: API Reference
 weight: 3
 bookCollapseSection: true
 ---
+
+# API Reference
+
+Detailed documentation for the public Go APIs.
+
+- [Generate]({{< relref "generate" >}}) -- text generation, streaming, and sampling options
+- [Inference]({{< relref "inference" >}}) -- model loading, GGUF parsing, architecture builders
+- [Serve]({{< relref "serve" >}}) -- OpenAI-compatible HTTP server and middleware
diff --git a/content/docs/architecture/_index.md b/content/docs/architecture/_index.md
@@ -3,3 +3,10 @@ title: Architecture
 weight: 5
 bookCollapseSection: true
 ---
+
+# Architecture
+
+How Zerfoo works under the hood.
+
+- [Overview]({{< relref "overview" >}}) -- inference pipeline, package layout, and design decisions
+- [GPU Setup]({{< relref "gpu-setup" >}}) -- CUDA, ROCm, and OpenCL configuration
diff --git a/content/docs/blog/_index.md b/content/docs/blog/_index.md
@@ -3,3 +3,7 @@ title: Blog
 weight: 11
 bookCollapseSection: true
 ---
+
+# Blog
+
+Development updates and technical deep dives.
diff --git a/content/docs/contributing/_index.md b/content/docs/contributing/_index.md
@@ -3,3 +3,9 @@ title: Contributing
 weight: 9
 bookCollapseSection: true
 ---
+
+# Contributing
+
+How to contribute to the Zerfoo ecosystem.
+
+- [Overview]({{< relref "overview" >}}) -- code style, PR process, and testing requirements
diff --git a/content/docs/cookbooks/_index.md b/content/docs/cookbooks/_index.md
@@ -3,3 +3,20 @@ title: Cookbooks
 weight: 4
 bookCollapseSection: true
 ---
+
+# Cookbooks
+
+Ready-to-use code recipes. Each cookbook is a self-contained example you can copy into your project.
+
+- [Basic Text Generation]({{< relref "basic-text-generation" >}})
+- [Streaming Chat]({{< relref "streaming-chat" >}})
+- [Structured JSON Output]({{< relref "structured-json-output" >}})
+- [Embedding Similarity]({{< relref "embedding-similarity" >}})
+- [Tool Calling]({{< relref "tool-calling" >}})
+- [RAG]({{< relref "rag" >}})
+- [LoRA Fine-Tuning]({{< relref "lora-fine-tuning" >}})
+- [Vision / Multimodal]({{< relref "vision-multimodal" >}})
+- [OpenAI Server]({{< relref "openai-server" >}})
+- [Batch Inference]({{< relref "batch-inference" >}})
+- [Speculative Decoding]({{< relref "speculative-decoding" >}})
+- [Custom Sampling]({{< relref "custom-sampling" >}})
diff --git a/content/docs/deployment/_index.md b/content/docs/deployment/_index.md
@@ -3,3 +3,10 @@ title: Deployment
 weight: 6
 bookCollapseSection: true
 ---
+
+# Deployment
+
+Guides for running Zerfoo in production environments.
+
+- [Production]({{< relref "production" >}}) -- configuration, monitoring, and scaling
+- [Enterprise]({{< relref "enterprise" >}}) -- multi-tenant, TLS/mTLS, and compliance
diff --git a/content/docs/getting-started/_index.md b/content/docs/getting-started/_index.md
@@ -3,3 +3,11 @@ title: Getting Started
 weight: 1
 bookCollapseSection: true
 ---
+
+# Getting Started
+
+Install Zerfoo, pull a model, and run your first inference in minutes.
+
+- [Installation]({{< relref "installation" >}}) -- install the Go module and CLI
+- [Quick Start]({{< relref "quickstart" >}}) -- generate text with three lines of Go
+- [First Inference]({{< relref "first-inference" >}}) -- a guided walkthrough of the inference pipeline
diff --git a/content/docs/reference/_index.md b/content/docs/reference/_index.md
@@ -3,3 +3,13 @@ title: Reference
 weight: 10
 bookCollapseSection: true
 ---
+
+# Reference
+
+Benchmarks, API stability guarantees, and migration guides.
+
+- [Benchmarks]({{< relref "benchmarks" >}}) -- throughput numbers and comparison methodology
+- [Granite Time Series]({{< relref "granite-timeseries" >}}) -- IBM Granite TTM, FlowState, and TSPulse models
+- [API Stability]({{< relref "api-stability" >}}) -- versioning and compatibility guarantees
+- [Extensions]({{< relref "extensions" >}}) -- plugin and extension points
+- [Migration to v1]({{< relref "migration-v1" >}}) -- upgrading from pre-v1 releases
diff --git a/content/docs/reference/benchmarks.md b/content/docs/reference/benchmarks.md
@@ -45,10 +45,11 @@ Ollama v0.17.7.
 | Gemma 3 1B Q4_K_M | gemma3 | 1B | **241** (256 tok) | 201 (256 tok) | **1.20x** | Zerfoo |
 | DeepSeek R1 1.5B Q4_K_M | deepseek2 | 1.5B | **192.83** | 184.75 | **1.04x** | Zerfoo |
 | Llama 3.2 3B Q4_K_M | llama | 3B | 96.06 | 97.66 | 0.98x | ~Even |
-| Mistral 7B Q4_K_M | mistral | 7B | 11.61 | 46.77 | 0.25x | Ollama |
+| Mistral 7B Q4_K_M | mistral | 7B | **44** | 46.77 | **0.94x** | ~Even |
 
 Zerfoo wins on small models (1B-1.5B). Llama 3.2 3B is at parity. Mistral 7B
-has a known performance regression ([investigation pending](https://github.com/zerfoo/zerfoo/issues)).
+was previously at 11 tok/s due to a performance regression; after the fix it
+runs at 44 tok/s (0.94x Ollama -- near parity).
 Additional architectures (Qwen, Phi, Mixtral, Command-R, Falcon, Mamba, RWKV)
 will be added as GGUF files are acquired and parser compatibility is resolved.
 
diff --git a/content/docs/reference/granite-timeseries.md b/content/docs/reference/granite-timeseries.md
@@ -0,0 +1,125 @@
+---
+title: "Granite Time Series"
+weight: 2
+bookToc: true
+---
+
+# Granite Time Series
+
+Zerfoo supports IBM Granite Time Series foundation models for time-series
+inference. Three model families are available, each targeting different tasks.
+
+## Model Families
+
+| Model | Parameters | Tasks | Key Feature |
+|-------|-----------|-------|-------------|
+| **Granite TTM** | 1M-5M | Forecasting | Zero-shot and few-shot forecasting with channel mixing |
+| **Granite FlowState** | 2M-10M | Forecasting | Continuous forecasting across arbitrary timescales |
+| **Granite TSPulse** | 1M-5M | Anomaly detection, classification, imputation, embedding | Lightweight encoder for multi-task time-series analysis |
+
+## Supported Tasks
+
+- **Forecasting** -- predict future values from historical context (TTM, FlowState)
+- **Anomaly detection** -- identify outliers and anomalous patterns (TSPulse)
+- **Classification** -- classify time-series segments (TSPulse)
+- **Imputation** -- fill in missing values (TSPulse)
+- **Embedding** -- extract fixed-size representations for downstream tasks (TSPulse)
+
+## Converting Models
+
+Granite Time Series models are published on HuggingFace in SafeTensors format.
+Use the `granite2gguf` converter (part of `zonnx`) to produce GGUF files:
+
+```bash
+go install github.com/zerfoo/zonnx/cmd/granite2gguf@latest
+
+# Convert a TTM model
+granite2gguf \
+  --model ibm-granite/granite-timeseries-ttm-r2 \
+  --output granite-ttm-r2.gguf
+
+# Convert a FlowState model
+granite2gguf \
+  --model ibm-granite/granite-timeseries-flowstate \
+  --output granite-flowstate.gguf
+
+# Convert a TSPulse model
+granite2gguf \
+  --model ibm-granite/granite-timeseries-tspulse \
+  --output granite-tspulse.gguf
+```
+
+The converter downloads weights from HuggingFace, maps the architecture to GGUF
+tensor names, and writes a self-contained `.gguf` file.
+
+## Running Inference
+
+### Forecasting (TTM)
+
+```go
+import "github.com/zerfoo/zerfoo/inference/timeseries"
+
+model, err := timeseries.LoadGGUF("granite-ttm-r2.gguf", engine)
+if err != nil {
+    log.Fatal(err)
+}
+defer model.Close()
+
+// Input: [batch, channels, context_length]
+// Output: [batch, channels, forecast_length]
+input := tensor.New[float32](engine, []int{1, 3, 512})
+// ... fill input with historical data ...
+
+forecast, err := model.Forecast(ctx, input)
+if err != nil {
+    log.Fatal(err)
+}
+fmt.Println("forecast shape:", forecast.Shape())
+```
+
+### Anomaly Detection (TSPulse)
+
+```go
+model, err := timeseries.LoadGGUF("granite-tspulse.gguf", engine)
+if err != nil {
+    log.Fatal(err)
+}
+defer model.Close()
+
+scores, err := model.DetectAnomalies(ctx, input)
+if err != nil {
+    log.Fatal(err)
+}
+// scores: per-timestep anomaly scores
+```
+
+### Embedding Extraction (TSPulse)
+
+```go
+embeddings, err := model.Embed(ctx, input)
+if err != nil {
+    log.Fatal(err)
+}
+// embeddings: [batch, embed_dim] fixed-size representations
+```
+
+## Architecture Details
+
+All three model families use a patch-based transformer encoder architecture:
+
+1. **Patching** -- the input time series is segmented into fixed-size patches
+2. **Channel mixing** -- multivariate channels are projected into a shared space
+3. **Transformer encoder** -- standard multi-head self-attention over patches
+4. **Task head** -- a linear projection head specific to the task (forecast, classify, reconstruct, embed)
+
+GGUF metadata stores the model family (`granite-ttm`, `granite-flowstate`,
+`granite-tspulse`), context length, forecast length, patch size, and number of
+channels. The inference runtime auto-configures based on these fields.
+
+## Model Sources
+
+| Model | HuggingFace Repo |
+|-------|-----------------|
+| Granite TTM R2 | [ibm-granite/granite-timeseries-ttm-r2](https://huggingface.co/ibm-granite/granite-timeseries-ttm-r2) |
+| Granite FlowState | [ibm-granite/granite-timeseries-flowstate](https://huggingface.co/ibm-granite/granite-timeseries-flowstate) |
+| Granite TSPulse | [ibm-granite/granite-timeseries-tspulse](https://huggingface.co/ibm-granite/granite-timeseries-tspulse) |
diff --git a/content/docs/tutorials/_index.md b/content/docs/tutorials/_index.md
@@ -3,3 +3,12 @@ title: Tutorials
 weight: 2
 bookCollapseSection: true
 ---
+
+# Tutorials
+
+Step-by-step guides for common workflows.
+
+- [Model Loading]({{< relref "model-loading" >}}) -- load GGUF models from disk or HuggingFace
+- [Text Generation]({{< relref "text-generation" >}}) -- generate text with sampling and streaming
+- [API Server]({{< relref "api-server" >}}) -- run an OpenAI-compatible HTTP server
+- [Tabular and Time Series]({{< relref "tabular-timeseries" >}}) -- train and run tabular and time-series models
diff --git a/content/docs/zonnx/_index.md b/content/docs/zonnx/_index.md
@@ -3,3 +3,11 @@ title: zonnx
 weight: 7
 bookCollapseSection: true
 ---
+
+# zonnx
+
+ONNX and SafeTensors to GGUF converter. Standalone CLI binary with zero runtime dependencies.
+
+- [Overview]({{< relref "overview" >}}) -- what zonnx does and how it works
+- [ONNX to GGUF]({{< relref "onnx-to-gguf" >}}) -- convert ONNX models
+- [SafeTensors to GGUF]({{< relref "safetensors-to-gguf" >}}) -- convert SafeTensors models