Skip to content

Commit fd04e16

Browse files
committed
docs: update website to reflect current code reality
- Models grid: expanded from 8 to 18 cards (GPT-2, Nemotron-H, MiniMax M2, Command R, Falcon, RWKV, Mamba/Mamba 3, Jamba, Whisper, LLaVA/Qwen-VL, BERT, Granite TS added) - CLI section: added QuaRot, eagle-train, transmla, Multi-LoRA examples - Go version: updated 1.25 -> 1.26 across all pages (6 files) - Gemma 3 -> Gemma 3/3n, Llama 3 -> Llama 3/4 in model grid
1 parent 1e0f8ea commit fd04e16

File tree

6 files changed

+37
-19
lines changed

6 files changed

+37
-19
lines changed

content/_index.html

Lines changed: 29 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -386,7 +386,7 @@ <h3>Structured Output &amp; Tools</h3>
386386
<div class="feat">
387387
<div class="icon">&#129518;</div>
388388
<h3>Type-Safe Generics</h3>
389-
<p>Go 1.25 generics throughout — <code>tensor.Numeric</code> constraint for compile-time type safety across float32, float16, bfloat16, float8, and quantized types.</p>
389+
<p>Go 1.26 generics throughout — <code>tensor.Numeric</code> constraint for compile-time type safety across float32, float16, bfloat16, float8, and quantized types.</p>
390390
</div>
391391
<div class="feat">
392392
<div class="icon">&#128202;</div>
@@ -427,7 +427,7 @@ <h3>Advanced Serving</h3>
427427
<div class="wrap">
428428
<div class="section-head">
429429
<h2>Faster than Ollama</h2>
430-
<p>Benchmarked on NVIDIA DGX Spark (GB10), CUDA 13.0, Go 1.25. Gemma 3 1B Q4_K_M, 256 tokens.</p>
430+
<p>Benchmarked on NVIDIA DGX Spark (GB10), CUDA 13.0, Go 1.26. Gemma 3 1B Q4_K_M, 256 tokens.</p>
431431
</div>
432432
<div style="overflow-x:auto">
433433
<table class="bench-table">
@@ -490,14 +490,24 @@ <h2>Supported models</h2>
490490
<p>28 architectures across 16 model families. Load any GGUF model from HuggingFace.</p>
491491
</div>
492492
<div class="model-grid">
493-
<div class="model-card"><div class="name">Gemma 3</div><div class="status prod">Production</div></div>
494-
<div class="model-card"><div class="name">Llama 3</div><div class="status prod">Production</div></div>
493+
<div class="model-card"><div class="name">Gemma 3/3n</div><div class="status prod">Production</div></div>
494+
<div class="model-card"><div class="name">Llama 3/4</div><div class="status prod">Production</div></div>
495495
<div class="model-card"><div class="name">Qwen 2.5</div><div class="status prod">Production</div></div>
496-
<div class="model-card"><div class="name">Mistral</div><div class="status prod">Production</div></div>
496+
<div class="model-card"><div class="name">Mistral/Mixtral</div><div class="status prod">Production</div></div>
497497
<div class="model-card"><div class="name">Phi 3/4</div><div class="status prod">Production</div></div>
498-
<div class="model-card"><div class="name">DeepSeek V3</div><div class="status prod">Production</div></div>
499-
<div class="model-card"><div class="name">SigLIP</div><div class="status">Vision encoder</div></div>
500-
<div class="model-card"><div class="name">Kimi-VL</div><div class="status">Vision-language</div></div>
498+
<div class="model-card"><div class="name">DeepSeek V3</div><div class="status prod">MLA + MoE</div></div>
499+
<div class="model-card"><div class="name">GPT-2</div><div class="status prod">TinyStories</div></div>
500+
<div class="model-card"><div class="name">Nemotron-H</div><div class="status">Hybrid Mamba+MoE</div></div>
501+
<div class="model-card"><div class="name">MiniMax M2</div><div class="status">Sigmoid MoE</div></div>
502+
<div class="model-card"><div class="name">Command R</div><div class="status prod">Production</div></div>
503+
<div class="model-card"><div class="name">Falcon</div><div class="status prod">Production</div></div>
504+
<div class="model-card"><div class="name">RWKV</div><div class="status">Linear attention</div></div>
505+
<div class="model-card"><div class="name">Mamba/Mamba 3</div><div class="status">State space</div></div>
506+
<div class="model-card"><div class="name">Jamba</div><div class="status">Hybrid SSM</div></div>
507+
<div class="model-card"><div class="name">Whisper</div><div class="status">Audio</div></div>
508+
<div class="model-card"><div class="name">LLaVA/Qwen-VL</div><div class="status">Vision-language</div></div>
509+
<div class="model-card"><div class="name">BERT</div><div class="status">Encoder</div></div>
510+
<div class="model-card"><div class="name">Granite TS</div><div class="status">Time series</div></div>
501511
</div>
502512
<div style="text-align:center;margin-top:32px">
503513
<p style="color:var(--fg3);font-size:.875rem">Uses GGUF as the sole model format. Compatible with llama.cpp, Ollama, LM Studio, and GPT4All model files.</p>
@@ -529,10 +539,18 @@ <h2>CLI included</h2>
529539
<span class="cmt"># OpenAI-compatible API server</span>
530540
$ zerfoo serve gemma-3-1b-q4 --port 8080
531541

532-
<span class="cmt"># Query with any OpenAI client</span>
542+
<span class="cmt"># QuaRot weight fusion for uniform 4-bit quantization</span>
543+
$ zerfoo run --quarot model.gguf
544+
545+
<span class="cmt"># Train an EAGLE speculative decoding head</span>
546+
$ zerfoo eagle-train --model model.gguf --corpus data.txt --output eagle.gguf
547+
548+
<span class="cmt"># Convert MHA model to Multi-head Latent Attention</span>
549+
$ zerfoo transmla --input model.gguf --output model-mla.gguf
550+
551+
<span class="cmt"># Multi-LoRA serving (per-request adapter selection)</span>
533552
$ curl <span class="str">http://localhost:8080/v1/chat/completions</span> \
534-
-H <span class="str">"Content-Type: application/json"</span> \
535-
-d <span class="str">'{"model":"gemma-3-1b-q4","messages":[{"role":"user","content":"Hello!"}]}'</span></pre>
553+
-d <span class="str">'{"model":"gemma3-1b:my-lora","messages":[{"role":"user","content":"Hello!"}]}'</span></pre>
536554
</div>
537555
</div>
538556
</section>

content/docs/blog/how-we-beat-ollama-cuda-graph-capture.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ All benchmark numbers follow the methodology documented in `docs/benchmarking-me
8080
| Memory | 128 GB unified LPDDR5x |
8181
| GPU SM | sm_121 |
8282
| Model | Gemma 3 1B Q4_K_M (GGUF) |
83-
| Go | 1.25.0 |
83+
| Go | 1.26.1 |
8484
| CUDA | 13.0 |
8585
| Measurement | Decode-only throughput (tok/s) |
8686
| Token count | 256 tokens minimum |

content/docs/contributing/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ Each repo is versioned and released independently. Do not treat this as a monore
4141

4242
### Prerequisites
4343

44-
- **Go 1.25+** (generics with `tensor.Numeric` constraint)
44+
- **Go 1.26+** (generics with `tensor.Numeric` constraint)
4545
- **Git**
4646
- **CUDA Toolkit** (optional, for GPU-accelerated tests and development)
4747

content/docs/getting-started/first-inference.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,15 +10,15 @@ Go from zero to working LLM inference in under 5 minutes.
1010

1111
## Prerequisites
1212

13-
- **Go 1.25 or later** -- [download Go](https://go.dev/dl/)
13+
- **Go 1.26 or later** -- [download Go](https://go.dev/dl/)
1414
- A machine with at least 4 GB of RAM (8 GB recommended for 7B models)
1515
- Optional: an NVIDIA GPU with CUDA drivers for hardware-accelerated inference
1616

1717
Verify your Go installation:
1818

1919
```bash
2020
go version
21-
# go version go1.25.0 linux/amd64
21+
# go version go1.26.1 linux/amd64
2222
```
2323

2424
## Install the CLI

content/docs/getting-started/installation.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,13 @@ bookToc: true
66

77
# Installation
88

9-
Zerfoo requires **Go 1.25 or later**. [Download Go](https://go.dev/dl/) if you haven't already.
9+
Zerfoo requires **Go 1.26 or later**. [Download Go](https://go.dev/dl/) if you haven't already.
1010

1111
Verify your Go installation:
1212

1313
```bash
1414
go version
15-
# go version go1.25.0 linux/amd64
15+
# go version go1.26.1 linux/amd64
1616
```
1717

1818
## As a Library
@@ -53,7 +53,7 @@ Zerfoo builds with **zero CGo by default** (`CGO_ENABLED=0`). GPU acceleration i
5353

5454
## Platform Support
5555

56-
Zerfoo compiles on any platform supported by Go 1.25, including **Linux**, **macOS**, and **Windows**.
56+
Zerfoo compiles on any platform supported by Go 1.26, including **Linux**, **macOS**, and **Windows**.
5757

5858
GPU acceleration is available on:
5959

content/docs/reference/benchmarks.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -276,7 +276,7 @@ The `-p 0` flag skips prompt processing to measure pure decode throughput.
276276
git clone https://github.com/zerfoo/zerfoo.git
277277
cd zerfoo
278278

279-
# 2. Ensure Go 1.25+ is installed
279+
# 2. Ensure Go 1.26+ is installed
280280
go version
281281

282282
# 3. Download dependencies

0 commit comments

Comments
 (0)