Advanced reasoning models for Open WebUI using Adaptive Branching Monte Carlo Tree Search (AB-MCTS) and Multi-Model collaboration.
This project implements Sakana AI's AB-MCTS (Adaptive Branching Monte Carlo Tree Search) algorithm and a Multi-Model collaboration system, both integrated with Open WebUI as selectable AI models for advanced reasoning and decision-making.
- AB-MCTS Pipeline: Advanced tree search with LLM-as-judge quality evaluation
- Multi-criterion evaluation (accuracy, completeness, clarity, relevance)
- Configurable criterion weights
- Support for 1-2 judge models for consensus
- Real-time tree visualization
- Multi-Model Pipeline: Multi-model collaboration for comprehensive answers
- OpenAI-Compatible API: Native integration with Open WebUI's model system
- Real-time Monitoring: Prometheus metrics and Grafana dashboards
- Experiment Logging: SQLite + JSONL run tracking for research and analysis
- Interactive Dashboard: Configure models, judges, and visualize search trees
┌─────────────────────────────────────────────────────────────┐
│ Open WebUI Interface │
├─────────────────────────────────────────────────────────────┤
│ Model Selection: │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ ab-mcts │ │ multi-model │ │
│ │ │ │ │ │
│ │ • Tree Search │ │ • Collaboration │ │
│ │ • Deep Analysis │ │ • Multi-perspective │
│ │ • Best Quality │ │ • Comprehensive │ │
│ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Model Integration Service (8098) │
│ OpenAI-Compatible API │
└─────────────────────────────────────────────────────────────┘
│
┌───────────────┴───────────────┐
▼ ▼
┌─────────────────────────┐ ┌─────────────────────────┐
│ AB-MCTS Service │ │ Multi-Model Service │
│ (port 8094) │ │ (port 8090) │
│ │ │ │
│ • TreeQuest Algorithm │ │ • Direct Collaboration │
│ • Thompson Sampling │ │ • Model Voting │
│ • Anti-Hallucination │ │ • Synthesis │
└─────────────────────────┘ └─────────────────────────┘
│ │
└───────────────┬───────────────┘
▼
┌─────────────────────────────────────────────────────────────┐
│ Ollama │
│ Local LLM Inference Engine │
└─────────────────────────────────────────────────────────────┘
openwebui-setup/
├── README.md # This file
├── docker-compose.yml # Docker orchestration
├── Dockerfile # Container definition
├── requirements.txt # Python dependencies
├── backend/
│ ├── api/
│ │ └── main.py # Management API (port 8095)
│ ├── services/
│ │ ├── proper_treequest_ab_mcts_service.py # AB-MCTS (port 8094)
│ │ ├── proper_multi_model_service.py # Multi-Model (port 8090)
│ │ ├── experiment_logger.py # Run logging
│ │ └── config_manager.py # Configuration
│ ├── model_integration.py # OpenAI-compatible model API (8098)
│ └── openwebui_integration.py # Tool endpoints (8097)
├── interfaces/
│ ├── dashboard.html # Management dashboard
│ └── idiots_guide.html # Setup guide
└── logs/ # Experiment logs and runs
- Docker and Docker Compose
- Ollama running locally (port 11434)
- Recommended models:
llama3.2:latest,qwen2.5:latest,deepseek-r1:1.5b
- Clone the repository:
git clone https://github.com/yourusername/openwebui-setup.git
cd openwebui-setup- Pull Ollama models:
ollama pull llama3.2:latest
ollama pull qwen2.5:latest
ollama pull deepseek-r1:1.5b- Start all services:
docker-compose up -d- Verify services:
docker-compose psAll services should show "Up" status.
-
Open Open WebUI at
http://localhost:3000 -
Add the model provider:
- Click your profile → Settings
- Go to Connections
- Click + Add Connection
- Select OpenAI
- API Base URL:
http://model-integration:8098 - API Key:
dummy-key(any value works) - Click Verify Connection → Should show "✓ Connected"
- Click Save
-
Select a model:
- Start a new chat
- Click the model dropdown
- Select either:
- ab-mcts - Advanced tree search reasoning
- multi-model - Collaborative AI
Best for:
- Complex problem solving
- Multi-step reasoning
- Strategic planning
- Mathematical proofs
- Decision trees
Example queries:
- "Design a distributed caching system for a social media platform"
- "Prove that the square root of 2 is irrational"
- "What's the optimal strategy for a two-player game where..."
Note: Responses may take 30-120 seconds due to tree search exploration.
Best for:
- Comprehensive analysis
- Multiple perspectives
- Research questions
- Balanced viewpoints
- Faster responses
Example queries:
- "Compare microservices vs monolithic architectures"
- "Analyze the pros and cons of remote work"
- "Explain quantum computing to different audiences"
| Service | Port | Description |
|---|---|---|
| Open WebUI | 3000 | Main chat interface |
| Model Integration | 8098 | OpenAI-compatible model API |
| AB-MCTS Service | 8094 | TreeQuest AB-MCTS implementation |
| Multi-Model Service | 8090 | Multi-model collaboration |
| Backend API | 8095 | Management dashboard API |
| MCP Server | 8096 | Model Context Protocol bridge |
| Prometheus | 9090 | Metrics collection |
| Grafana | 3001 | Dashboards and visualization |
| HTTP Server | 8081 | Static interfaces |
Access the interactive dashboard at http://localhost:8081/dashboard.html
Features:
- Model Selection: Choose which Ollama models power each service
- Judge Configuration: Select 1-2 LLMs to evaluate solution quality
- Criterion Weights: Adjust importance of accuracy, completeness, clarity, and relevance
- Search Parameters: Configure iterations and max depth
- Tree Visualization: View AB-MCTS search trees (Sakana AI style)
- Run History: Browse past queries and their exploration trees
Configure via Dashboard or API:
curl -X POST http://localhost:8094/params/update \
-H "Content-Type: application/json" \
-d '{
"iterations": 20,
"max_depth": 5
}'Parameters:
iterations: Number of search iterations (1-100, default: 20)- Higher = better quality, slower response
- Recommended: 10-20 for most queries
max_depth: Maximum tree depth (1-20, default: 5)- Higher = deeper reasoning, slower response
- Recommended: 3-5 for most queries
AB-MCTS uses LLM judges to evaluate solution quality on 4 criteria:
Criteria:
- Accuracy: Is it factually correct?
- Completeness: Does it fully answer the question?
- Clarity: Is it well-explained and understandable?
- Relevance: Is it on-topic and addresses the query?
Configuration:
# Set judge models (1-2 recommended for consensus)
curl -X POST http://localhost:8094/judges/update \
-H "Content-Type: application/json" \
-d '{"judge_models": ["qwen3:0.6b"]}'
# Adjust criterion weights (auto-normalizes to 100%)
curl -X POST http://localhost:8094/weights/update \
-H "Content-Type: application/json" \
-d '{
"weights": {
"accuracy": 0.4,
"completeness": 0.3,
"clarity": 0.2,
"relevance": 0.1
}
}'Notes:
- Using 2 judges provides consensus and reduces bias
- Weights persist across restarts
- All settings are managed in the dashboard UI
Update which Ollama models each service uses:
# AB-MCTS models
curl -X POST http://localhost:8094/models/update \
-H "Content-Type: application/json" \
-d '{"models": ["llama3.2:latest", "qwen2.5:latest"]}'
# Multi-Model models
curl -X POST http://localhost:8090/models/update \
-H "Content-Type: application/json" \
-d '{"models": ["llama3.2:latest", "qwen2.5:latest", "deepseek-r1:1.5b"]}'Access Prometheus at http://localhost:9090
Key metrics:
model_integration_requests_total- Total requests by modelmodel_integration_success_total- Successful responsesmodel_integration_failures_total- Failed responsesmodel_integration_latency_seconds- Response time histogrammodel_integration_active_queries- Current active queries
Access Grafana at http://localhost:3001 (credentials: admin/admin)
Pre-configured dashboards:
- Request rates and success rates
- Latency percentiles (p50, p95, p99)
- Active query monitoring
- Error rates by type
- Service health status
All AB-MCTS runs are logged with complete search tree data:
logs/runs.db- SQLite indexlogs/runs/YYYYMMDD/run_<id>.jsonl- Event stream per runlogs/selected_models_abmcts.json- Persisted configuration
View in Dashboard:
- Go to
http://localhost:8081/dashboard.html - Click "Research Explorer" tab
- Click any run to view:
- Full hierarchical search tree visualization (Sakana AI style)
- Per-node quality scores and judge evaluations
- Model performance across iterations
- Complete response text for each node
Tree Visualization Features:
- D3.js interactive tree graph
- Color-coded by model and quality
- Zoom and pan navigation
- Click nodes to see full details
- Shows parent-child relationships
- Identifies best solution path
API Access:
- List runs:
GET http://localhost:8094/runs?limit=50 - Run details:
GET http://localhost:8094/runs/{run_id} - Tree data:
GET http://localhost:8094/runs/{run_id}/tree
Check model integration service:
curl http://localhost:8098/health
curl http://localhost:8098/v1/modelsVerify Open WebUI connection:
- Settings → Connections → Verify the connection shows "✓ Connected"
- Try refreshing the page
- Check browser console for errors
Reduce AB-MCTS iterations:
curl -X POST http://localhost:8095/api/config \
-d '{"ab_mcts_iterations": 10, "ab_mcts_max_depth": 3}'Use faster Ollama models:
ollama pull llama3.2:1b # Smaller, faster modelCheck Ollama performance:
time curl http://localhost:11434/api/generate \
-d '{"model":"llama3.2:latest","prompt":"test","stream":false}'Check all services are running:
docker-compose psView service logs:
docker logs model-integration
docker logs ab-mcts-service
docker logs multi-model-serviceRestart services:
docker-compose restart- Timeouts: AB-MCTS can take 30-120s on complex queries (streaming keeps UI responsive)
- Verbosity: AB-MCTS responses can be lengthy (working on length controls)
- Quality drift: Occasional hallucinations (add stricter validation)
OpenAI-Compatible Endpoints:
GET /v1/models- List available modelsPOST /v1/chat/completions- Chat completions
Management Endpoints:
GET /health- Health checkGET /metrics- Prometheus metricsGET /performance- Performance statisticsGET /config- Current configurationPOST /config- Update configuration
POST /query- Run AB-MCTS query- Body:
{"query": "...", "iterations": 20, "max_depth": 5}
- Body:
GET /models- List available modelsPOST /models/update- Update model selectionGET /health- Health checkGET /metrics- Prometheus metrics
POST /query- Run multi-model query- Body:
{"query": "..."}
- Body:
GET /models- List available modelsPOST /models/update- Update model selectionGET /health- Health checkGET /metrics- Prometheus metrics
- Scientific Data Enrichment Tool - Chemistry and materials science enrichment for Open WebUI (separate tool)
MIT License - See LICENSE file for details.
- Sakana AI for AB-MCTS research and TreeQuest
- Open WebUI for the chat interface
- Ollama for local LLM inference
- Prometheus & Grafana for observability
ARCHITECTURE.md- Detailed architecture and designAPI_REFERENCE.md- Complete API documentationDEPLOYMENT.md- Production deployment guidedocs/research/RESEARCH_GUIDE.md- Research and analysis guide