Agent Pirate Bunny

_{Generated with ollama run x/flux2-klein:9b "a cartoon of a cute fluffy bunny sitting close to a beautiful pirate hat"}

Fine-tune a small LLM to always speak like a pirate captain with a bunny crew, regardless of topic. The pirate-bunny persona is baked into the model weights via LoRA SFT — no system prompt needed at inference time.

Methodology

Approach

The goal is persona injection: make a base model always respond in a specific style (heavy pirate dialect + bunny references) while preserving its knowledge and reasoning capabilities. We use LoRA Supervised Fine-Tuning (SFT) — the standard approach for teaching a model "how to talk" rather than "what to know."

The key design decision is no system prompt in training data or at inference. Instead of relying on a system prompt to instruct the model to act like a pirate, we train the persona directly into the weights. This means the model defaults to pirate-bunny behavior on any user message.

Data Generation Pipeline

Questions — sampled from Dolly-15k, stratified across 4 categories (open_qa, general_qa, brainstorming, creative_writing). This gives diverse, realistic user questions without needing to generate them synthetically.
Answers — generated by gpt-oss:120b-cloud via Ollama cloud, prompted to respond in character as Captain Flopsy with detailed pirate dialect and bunny reference rules. A post-processing step strips any leaked chain-of-thought reasoning (<think> tags or unprompted preamble).
Format — each example is a two-turn ChatML conversation (user question → assistant answer), stored as JSONL. No system message included. Split 75/25 into training and validation sets.

Training

LoRA fine-tuning via MLX (mlx_lm.lora), running natively on Apple Silicon. The adapter modifies a small subset of the model's weights (0.365% of parameters) to learn the pirate-bunny style.

mask_prompt: true ensures the loss is computed only on the assistant's response tokens, not the user's question — so the model learns how to respond, not how to parrot questions.

Conversion and Deployment

The trained LoRA adapter is fused back into the base model, converted to GGUF format via llama.cpp, and served locally through Ollama with a ChatML template.

Hardware

Machine: MacBook Pro, Apple M4 Max, 128GB unified memory
Training memory: ~11 GB peak
Training throughput: ~650 tokens/sec
Training time: ~5 minutes for 1125 iterations (500 examples, ~3 epochs)

Hyperparameters

Based on the Unsloth LoRA Hyperparameters Guide and Unsloth Datasets Guide:

Parameter	Unsloth recommends	Our config	Notes
Base model	Instruct variant for <300 examples	Qwen3-4B-Instruct (bf16)	Model selection guide
Dataset size	100 minimum, 1000+ optimal	500 (375 train / 125 valid)
Epochs	1–3	~3 (1125 iters / 375 examples)	More than 3 risks overfitting
Learning rate	`2e-4` for LoRA	`2e-4`	`5e-6` for RL methods (DPO/GRPO)
LoRA rank	16 or 32	16	Higher = more capacity, more memory
LoRA scale	>= 1 (alpha = rank or 2x rank)	1.0	Controls adapter strength
LoRA layers	—	16
Batch size	2 (with grad accum 8 = effective 16)	1	Limited by dataset size
Dropout	0.0–0.1	0.0	0.1 if overfitting
mask_prompt	—	true	Loss only on assistant tokens

Prerequisites

uv — Python package manager
Ollama — for inference and cloud model access (data generation)
llama.cpp — cloned to ~/Developer/llama.cpp for GGUF conversion

Project Structure

agent_pirate_bunny/
├── main.py                 # CLI orchestrator (generate/train/convert/all)
├── generate_dataset.py     # Dataset generation (Dolly questions + Ollama cloud answers)
├── train.py                # Training wrapper (calls mlx_lm.lora)
├── convert.py              # Fuse → GGUF → Ollama deployment
├── config/
│   ├── prompts.py          # Response generation prompt (pirate-bunny rules)
│   └── lora_config.yaml    # LoRA hyperparameters
├── data/
│   ├── train.jsonl         # Training data (375 examples)
│   └── valid.jsonl         # Validation data (125 examples)
├── adapters/               # LoRA adapter checkpoints (from training)
├── fused_model/            # Fused model output (from convert)
├── Modelfile               # Ollama model definition (ChatML template, no system prompt)
└── results.md              # Training curves, sample outputs, checkpoint comparison

Usage

Full pipeline

uv run python main.py all

Runs generate → (pause for data review) → train → convert.

Individual steps

# 1. Generate training data (Dolly questions + Ollama cloud pirate-bunny answers)
uv run python main.py generate

# 2. Train LoRA adapter
uv run python main.py train

# 3. Convert and deploy (final checkpoint)
uv run python main.py convert

# 3b. Convert a specific checkpoint (e.g. iter 400, best val loss)
uv run python main.py convert --checkpoint 400

Test

ollama run pirate-bunny "teach me about gradient descent"
ollama run pirate-bunny "write the python code to solve a quadratic equation"
ollama run pirate-bunny "What's the recipe for good spaghetti"

Results

See results.md for full training curves, sample outputs from both the final and best-validation checkpoints, and run comparisons.

Summary

Val loss bottomed at iter 400 (1.403), slight overfitting in epochs 2-3
Both checkpoints produce heavy pirate dialect with dense bunny references
Code generation is correct and runnable with pirate-bunny variable names
Iter 400 checkpoint produces more concise outputs

Design Decisions

Choice	Reason
LoRA SFT	Persona injection — teaching a new output style. SFT is the standard approach.
Qwen3-4B-Instruct	Dense (not MoE), strong chat/code baseline, modern architecture. Instruct variant recommended for <1000 examples.
bf16 (not quantized)	Cleaner gradients during training, compatible with llama.cpp GGUF converter.
No system prompt	Pirate-bunny behavior baked into weights, not dependent on prompting.
Dolly-15k for questions	Diverse, real-world questions across multiple categories. Avoids synthetic question generation artifacts.
Ollama cloud for answers	Offloads answer generation to cloud GPU, preserving local GPU for training.
MLX	Native Apple Silicon training — no CUDA required, uses unified memory efficiently.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Pirate Bunny

Methodology

Approach

Data Generation Pipeline

Training

Conversion and Deployment

Hardware

Hyperparameters

Prerequisites

Project Structure

Usage

Full pipeline

Individual steps

Test

Results

Summary

Design Decisions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
data		data
pictures		pictures
.gitignore		.gitignore
.python-version		.python-version
Modelfile		Modelfile
README.md		README.md
convert.py		convert.py
generate_dataset.py		generate_dataset.py
main.py		main.py
pyproject.toml		pyproject.toml
results.md		results.md
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Agent Pirate Bunny

Methodology

Approach

Data Generation Pipeline

Training

Conversion and Deployment

Hardware

Hyperparameters

Prerequisites

Project Structure

Usage

Full pipeline

Individual steps

Test

Results

Summary

Design Decisions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages