The new frame_embeddings table must be created before using the multi-pipeline system.
Option A: Using virtual environment (recommended)
cd backend
source .venv/bin/activate # or your venv path
alembic upgrade head
deactivateOption B: Using Docker/Docker Compose
docker-compose exec backend alembic upgrade headOption C: Manual SQL (if alembic unavailable)
CREATE TABLE frame_embeddings (
id SERIAL PRIMARY KEY,
frame_id INTEGER NOT NULL REFERENCES frames(id) ON DELETE CASCADE,
pipeline_id VARCHAR(100) NOT NULL,
embedding TEXT NOT NULL,
model_version VARCHAR(100),
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
CONSTRAINT uq_frame_embedding_pipeline UNIQUE (frame_id, pipeline_id)
);
CREATE INDEX ix_frame_embeddings_frame_id ON frame_embeddings(frame_id);
CREATE INDEX ix_frame_embeddings_pipeline_id ON frame_embeddings(pipeline_id);Start the backend and verify pipelines are registered:
cd backend
uvicorn app.main:app --reloadThen in another terminal:
curl http://localhost:8000/api/vision/pipelinesExpected output:
{
"pipelines": [
{
"id": "clip_vitb32",
"name": "CLIP ViT-B/32 (Standard)",
"model_id": "ViT-B-32",
"input_resolution": 224,
"device": "mps", // or "cpu"
"dtype": "float32",
"version": "2.24.0",
"loaded": false // true after first use
},
{
"id": "openclip_vitl14",
"name": "OpenCLIP ViT-L/14 (Enhanced)",
"model_id": "ViT-L-14",
"input_resolution": 224,
"device": "mps",
"dtype": "float32",
"version": "2.24.0",
"loaded": false
}
]
}Analyze a frame with the enhanced pipeline:
# Replace YOUR_TOKEN with admin or moderator token
# Replace 1 with an actual frame ID from your database
curl -X POST http://localhost:8000/api/vision/analyze \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{
"frame_id": 1,
"pipeline_id": "openclip_vitl14",
"force": false
}'Expected response:
{
"status": "success",
"frame_id": 1,
"pipeline_id": "openclip_vitl14",
"embedding": [0.123, 0.456, ...], // 768-dim for ViT-L/14
"embedding_dimension": 768,
"attributes": [
{
"attribute": "time_of_day",
"value": "day",
"confidence": 0.92,
"is_verified": false
},
// ... more attributes
],
"cached": false,
"embed_time": 0.45,
"attribute_time": 0.12
}cd frontend
npm run devVisit http://localhost:3000 and:
- Open Settings panel
- Scroll to "Vision Pipelines" section
- Verify both pipelines are listed
- Click "Load / warm up models" if needed
Add to backend/.env:
# Standard CLIP (existing)
CLIP_MODEL_NAME=ViT-B-32
CLIP_PRETRAINED=openai
# Enhanced CLIP (new)
ENHANCED_CLIP_MODEL_NAME=ViT-L-14
ENHANCED_CLIP_PRETRAINED=laion2b_s32b_b82k
ENHANCED_CLIP_BATCH_SIZE=4CLIP ViT-B/32 (Standard):
- Fast inference (~100ms on MPS)
- 512-dimensional embeddings
- Good for real-time analysis
- Lower memory usage (~350MB)
OpenCLIP ViT-L/14 (Enhanced):
- Better accuracy (~5-10% improvement)
- 768-dimensional embeddings
- Slower inference (~300ms on MPS)
- Higher memory usage (~900MB)
The system automatically detects and uses:
- MPS (Metal Performance Shaders) on Apple Silicon Macs
- CUDA if NVIDIA GPU is available
- CPU as fallback
Check logs for device selection:
INFO:app.services.vision_pipelines.clip_vitb32:Loaded CLIP ViT-B/32 pipeline: ViT-B-32 (openai) on mps
INFO:app.services.vision_pipelines.openclip_vitl14:Loaded OpenCLIP ViT-L/14 pipeline: ViT-L-14 (laion2b_s32b_b82k) on mps
from app.services.vision_pipelines import get_pipeline, list_pipelines
from app.services import vision_service
from app.db import get_db
from PIL import Image
# List available pipelines
for pipeline in list_pipelines():
print(f"{pipeline.id}: {pipeline.name} (loaded={pipeline.loaded})")
# Get a specific pipeline
pipeline = get_pipeline("openclip_vitl14")
# Embed an image directly
image = Image.open("frame.jpg")
result = pipeline.embed_image(image)
print(f"Embedding dimension: {len(result.embedding)}")
# Score attributes
scores = pipeline.score_attributes(image=image)
for score in scores:
print(f"{score.attribute}={score.value} ({score.confidence:.2f})")
# Full frame analysis with caching
session = next(get_db())
result = vision_service.analyze_frame(
frame_id=123,
pipeline_id="openclip_vitl14",
force=False, # use cache if available
session=session
)
print(f"Cached: {result['cached']}")
print(f"Embedding dim: {result['embedding_dimension']}")- Create pipeline class in
backend/app/services/vision_pipelines/:
from .base import VisionPipeline, PipelineMetadata, EmbeddingResult, AttributeScore
class MyCustomPipeline(VisionPipeline):
def get_metadata(self) -> PipelineMetadata:
return PipelineMetadata(
id="my_pipeline",
name="My Custom Pipeline",
model_id="custom-model-v1",
input_resolution=384,
device="cuda",
dtype="float32",
loaded=True,
)
def embed_image(self, image):
# Your embedding logic here
pass
def score_attributes(self, image=None, embedding=None, session=None):
# Your attribute scoring logic here
pass
def status(self):
return {"loaded": True, "device": "cuda"}- Register in
__init__.py:
from .my_custom import MyCustomPipeline
def _auto_register_pipelines():
# ... existing registrations ...
try:
custom = MyCustomPipeline()
register_pipeline(custom)
except Exception as e:
logger.error("Failed to register custom pipeline: %s", e)- Restart backend - pipeline automatically available at
/api/vision/pipelines
Issue: "loaded": false in pipeline status
Solution:
- Check import errors in logs
- Verify
open-clip-torchis installed:pip list | grep open-clip - Try manual warmup:
curl -X POST http://localhost:8000/api/models/vision/warmup -H "Authorization: Bearer YOUR_TOKEN"
Issue: Pipeline shows "device": "cpu" on Mac
Solutions:
- Ensure macOS 12.3+ and Python 3.8+
- Check PyTorch MPS:
python3 -c "import torch; print(torch.backends.mps.is_available())" - Reinstall PyTorch with MPS support
Issue: "cached": false on every call
Check:
- Database migration ran:
SELECT COUNT(*) FROM frame_embeddings; - No errors in
store_frame_embedding()logs - Pipeline ID is exactly
"clip_vitb32"or"openclip_vitl14"(case-sensitive)
Issue: ModuleNotFoundError: No module named 'app.services.vision_pipelines'
Solutions:
- Restart backend server to pick up new modules
- Check
__init__.pyfiles exist in package directories - Verify Python path includes backend directory
For analyzing multiple frames, use the enhanced pipeline's batch method:
from PIL import Image
pipeline = get_pipeline("openclip_vitl14")
images = [Image.open(f"frame_{i}.jpg") for i in range(10)]
# Process 4 at a time (configurable)
results = pipeline.embed_images_batch(images, batch_size=4)- First analysis: ~500ms (model load + inference)
- Cached analysis: ~5ms (database lookup)
- Force recompute only when needed (new model version, updated prototypes)
Both models loaded simultaneously: ~1.3GB GPU memory
To reduce usage:
- Use only standard pipeline (set in localStorage)
- Lazy loading means enhanced model only loads when first used
- Models automatically unload when process restarts
- Frontend integration: Connect FrameEditModal to use selected pipeline
- Batch jobs: Add Celery task for multi-frame enhanced analysis
- Trained heads: Add linear classifier on top of embeddings
- More pipelines: SigLIP, DINOv2, etc.
See walkthrough.md for complete implementation details.