OMNI

Speak anywhere. Act everywhere.

One AI brain. Every device. Infinite capabilities.

Omni is a multi-client AI agent hub that lets you speak to one intelligent agent from any device — web dashboard, mobile, Chrome extension, desktop, or smart glasses — and have it act across all of them simultaneously.

Demo Video · Architecture · Getting Started · Blog Post

The Problem

AI assistants today live in text boxes on single screens. You can't speak to your AI while wearing safety glasses on a factory floor. You can't add new capabilities without waiting for the next software update. You can't switch devices mid-thought and pick up where you left off.

Every AI assistant is an island.

The Solution

Omni connects one AI brain to every device you own. Speak from your phone, see results on your dashboard, trigger actions on your desktop — all in one continuous conversation.

One voice, every device — Web, mobile, Chrome extension, desktop tray app, ESP32 glasses
MCP Plugin Store — Install new agent capabilities in one click, like an app store for AI skills
GenUI — Agent renders live charts, tables, code blocks, and cards on your dashboard while speaking to you
Agent Personas — Switch between specialized AI personalities (analyst, coder, researcher) with distinct voices and skills
Browser Control — Tell your agent to scrape a website, fill a form, or extract data — all by voice
Cross-Client Actions — Say "save this to my dashboard" from your phone → it appears on your desktop instantly

Demo

🎥 Watch the 4-minute demo video → (coming soon)

Highlights

Moment	What Happens
Voice + GenUI	Ask about stock performance → agent speaks the answer while a chart renders live on the dashboard
Persona Switch	"Switch to Atlas" → voice changes instantly → ask for code → code block renders → "Execute it" → runs in sandbox
MCP Plugin Toggle	Enable Brave Search with one click → agent immediately searches the web → disable it → agent falls back gracefully
Cross-Client	Point phone camera at an object → agent describes it → "Saved to your dashboard" → switch to desktop → it's there

Architecture

3-Layer Capability-Based Agent Routing

                    ┌────────────────────────────────────────────┐
                    │              OMNI HUB (Cloud Run)          │
                    │                                            │
                    │  ┌──────────────────────────────────────┐  │
                    │  │        Root Agent "omni_root"        │  │
                    │  │         tools: [plan_task]            │  │
                    │  ├──────────────────────────────────────┤  │
                    │  │                                      │  │
                    │  │  LAYER 1 — Persona Pool              │  │
                    │  │  ┌──────────┐  ┌──────────┐         │  │
                    │  │  │assistant │  │  coder   │         │  │
                    │  │  │search,web│  │code,sandbox│       │  │
                    │  │  └──────────┘  └──────────┘         │  │
                    │  │  ┌──────────┐  ┌──────────┐         │  │
                    │  │  │researcher│  │ analyst  │         │  │
                    │  │  │search,kb │  │data,code │         │  │
                    │  │  └──────────┘  └──────────┘         │  │
                    │  │  ┌──────────┐                       │  │
                    │  │  │ creative │                       │  │
                    │  │  │media,art │                       │  │
                    │  │  └──────────┘                       │  │
                    │  │                                      │  │
                    │  │  LAYER 2 — TaskArchitect             │  │
                    │  │  (plan_task FunctionTool)            │  │
                    │  │  Decomposes complex multi-step tasks │  │
                    │  │                                      │  │
                    │  │  LAYER 3 — device_agent              │  │
                    │  │  Cross-client orchestration (T3)     │  │
                    │  │                                      │  │
                    │  ├──────────────────────────────────────┤  │
                    │  │  MCP Plugin System (Dynamic T2 Tools)│  │
                    │  │  Capability-tagged per persona       │  │
                    │  └──────────────────────────────────────┘  │
                    └────────────────┬───────────────────────────┘
                                    │
                 ┌──────────────────┼──────────────────┐
                 │     Raw WebSocket (Binary audio     │
                 │        + JSON control)              │
                 │                  │                   │
        ┌────────┴───┐  ┌──────────┴──┐  ┌────────────┴──┐
        │ Web        │  │ Mobile      │  │ Chrome        │
        │ Dashboard  │  │ PWA         │  │ Extension     │
        │ (React)    │  │ (Camera)    │  │ (Voice)       │
        └────────────┘  └─────────────┘  └───────────────┘
        ┌─────────────┐  ┌─────────────┐
        │ Desktop     │  │ ESP32       │
        │ Tray App    │  │ Glasses     │
        │ (Python)    │  │ (Protocol)  │
        └─────────────┘  └─────────────┘

Key Design Decisions

Decision	Choice	Why
Audio Transport	Binary WebSocket frames	33% smaller than base64-in-JSON, lower latency
Audio Pipeline	AudioWorklet (not ScriptProcessor)	Runs on separate thread, zero main-thread jank
Agent Framework	Google ADK with `AgentTool` pattern	Root uses `run_live()` for bidi audio; personas wrapped as `AgentTool` (runs `Runner.run_async()` + `generateContent` API internally)
Plugin System	MCP (Model Context Protocol)	Open standard, 10,000+ community tools
GenUI	Agent returns structured JSON → React renders	Audio + visual output simultaneously
Session Persistence	Vertex AI Agent Engine Sessions	Survives Cloud Run restarts, Google-managed

Tech Stack

Backend

Component	Technology
Runtime	Python 3.12+
Package Manager	uv
API Server	FastAPI + Uvicorn
Agent Framework	Google ADK v0.5+
Audio Model	`gemini-live-2.5-flash-native-audio` (root only)
Text Model	`gemini-2.5-flash` (persona agents via AgentTool)
GenUI Model	`gemini-2.5-flash-lite` (genui persona override)
Code Execution	E2B Sandbox + Agent Engine Code Execution

Frontend

Component	Technology
Framework	React 19 (JavaScript)
Build Tool	Vite
Styling	Tailwind CSS 4 + shadcn/ui
State	Zustand
Charts	Recharts
Icons	Lucide React
Toasts	Sonner

Google Cloud Services (16+)

Category	Services
Vertex AI	Gemini Live API, ADK, Grounding (Google Search + Maps), Agent Engine (Sessions + Memory Bank + Code Execution), Gen AI Evaluation, Imagen 4
Infrastructure	Cloud Run, Firestore, Firebase Auth, Cloud Storage, Secret Manager, Artifact Registry
Observability	Cloud Logging, Cloud Monitoring, Cloud Trace
DevOps	Cloud Build, Terraform

Getting Started

Prerequisites

Python 3.12+
Node.js 20+
uv (Python package manager)
Google Cloud account with billing enabled
Google Cloud CLI (gcloud)

1. Clone the repository

git clone https://github.com/omanandswami2005/omni-agent-hub-with-gemini-live.git
cd omni-agent-hub-with-gemini-live

2. Set up environment variables

cp .env.example .env
# Edit .env with your API keys:
#   GOOGLE_CLOUD_PROJECT=your-project-id
#   GOOGLE_CLOUD_LOCATION=us-central1
#   E2B_API_KEY=your-e2b-key

3. Start the backend

cd backend
uv sync
uv run uvicorn main:app --reload --port 8000

4. Start the frontend

cd frontend
pnpm install
pnpm run dev

5. Open the dashboard

Navigate to http://localhost:5173 — click "Sign in with Google" and start talking.

Deploy to Google Cloud

One-command deploy

cd deploy
terraform init
terraform apply

Manual deploy

# Build and push container
gcloud builds submit --tag gcr.io/$PROJECT_ID/omni-backend

# Deploy to Cloud Run
gcloud run deploy omni-backend \
  --image gcr.io/$PROJECT_ID/omni-backend \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --min-instances 1 \
  --set-env-vars "GOOGLE_CLOUD_PROJECT=$PROJECT_ID"

Agent Personas

Omni ships with 5 specialized AI personas. Each persona declares capabilities and receives only the tools that match via the ToolCapability tag system:

Persona	Role	Voice	Capabilities	Tools Received
assistant	General Assistant	Puck	search, web, knowledge, communication, media	google_search, rag_query, notification_sender + matched MCP
coder	Code & Debug	Kore	code_execution, sandbox, search, web	google_search, code_execution, e2b_sandbox + matched MCP
researcher	Deep Research	Aoede	search, web, knowledge	google_search + matched MCP
analyst	Data & Finance	Charon	code_execution, sandbox, search, data, web	google_search, code_execution, e2b_sandbox + matched MCP
creative	Content Creation	Fenrir	creative, media	imagen, creative_tools + matched MCP

Switch personas by voice: "Switch to coder" — or click the persona panel on the dashboard.

Create custom personas with capability tags: "Create a new persona called Chef with knowledge and search capabilities."

MCP Plugin Store

Omni's agent capabilities are extensible at runtime through the PluginRegistry — a unified plugin system supporting MCP servers (local, remote, and OAuth-secured), native Python modules, and E2B sandboxes. Enable/disable any plugin with a single toggle — the agent adapts instantly. No restart required.

Extension Points

Method	What	How
Native plugin	Python function tools	Drop `.py` in `backend/app/plugins/` with a `MANIFEST` → auto-discovered
MCP server	Stdio, HTTP, or OAuth MCP servers	Drop `.json` in `backend/app/mcps/` → auto-discovered at startup
OAuth MCP	Remote servers with OAuth 2.0	Set `kind: "mcp_oauth"` with `url` + `oauth` config → one-click connect in UI
Runtime API	Register via HTTP	`POST /api/v1/plugins/register` → persisted to `mcps/` JSON + live immediately

Built-in Catalog

Plugin	What It Does	Kind
Brave Search	Web search via Brave API	MCP Stdio
GitHub	Repo management, issues, PRs	MCP Stdio
Slack	Send/read messages	MCP Stdio
Notion	Read/write Notion pages (official)	MCP OAuth
Playwright	Browser automation and scraping	MCP Stdio
Filesystem	Read/write sandboxed files	MCP Stdio
E2B Sandbox	Code execution (100+ languages)	E2B
Wikipedia	Encyclopedia lookups	Native Plugin
RAG Documents	Upload & search documents	Native Plugin
Notification Sender	Webhook/log notifications	Native Plugin

Add a New MCP Server (3 ways)

1. JSON config file (recommended)

# Copy the template
cp backend/app/mcps/TEMPLATE.json backend/app/mcps/my-server.json
# Edit id, name, command, args, tags — restart backend

2. API registration (runtime, no restart)

curl -X POST http://localhost:8000/api/v1/plugins/register \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "my-server",
    "name": "My MCP Server",
    "kind": "mcp_stdio",
    "command": "npx",
    "args": ["-y", "@my-org/mcp-server"],
    "tags": ["search", "web"]
  }'
# Immediately available in catalog — persisted to mcps/my-server.json

3. Native Python plugin

cp backend/app/plugins/TEMPLATE.py backend/app/plugins/my_plugin.py
# Edit MANIFEST, implement tool functions — restart backend

Create your own plugin in 5 minutes. See TEMPLATE.py or TEMPLATE.json.

54 tests cover plugin lifecycle, tool discovery, T3 proxy tools, and hardening fixes.

Project Structure

omni-agent-hub-with-gemini-live/
├── backend/                    # Python FastAPI + ADK
│   ├── app/
│   │   ├── agents/             # Agent definitions
│   │   │   ├── root_agent.py   # Root orchestrator (3-layer routing)
│   │   │   ├── agent_factory.py # Capability-based per-persona builder
│   │   │   ├── cross_client_agent.py # device_agent (Layer 3)
│   │   │   └── task_planner_tool.py  # TaskArchitect plan_task tool (Layer 2)
│   │   ├── api/                # REST + WebSocket endpoints
│   │   │   ├── ws_live.py      # /ws/live (audio) + /ws/chat (text)
│   │   │   ├── plugins.py      # Plugin catalog, toggle, schemas
│   │   │   └── init.py         # Bootstrap endpoint (single-trip load)
│   │   ├── tools/              # T1 core tools
│   │   │   └── cross_client.py # Cross-device action tools
│   │   ├── services/           # Business logic singletons
│   │   │   ├── plugin_registry.py   # T2 plugin lifecycle (MCP+native+E2B)
│   │   │   ├── tool_registry.py     # Per-persona T1+T2+T3 via capability matching
│   │   │   ├── connection_manager.py # WS registry + capability storage
│   │   │   ├── mcp_manager.py       # Backward-compat wrapper
│   │   │   └── event_bus.py         # Dashboard event fan-out
│   │   ├── plugins/            # Auto-discovered native plugins
│   │   │   ├── notification_sender.py # Example native plugin
│   │   │   └── TEMPLATE.py     # Developer template (copy to create plugins)
│   │   ├── mcps/               # MCP server configs (JSON auto-discovery)
│   │   │   ├── TEMPLATE.json   # Copy to add a new MCP server
│   │   │   ├── brave-search.json # Brave Search, GitHub, Slack, etc.
│   │   │   └── ...             # Drop JSON here → auto-discovered at startup
│   │   ├── models/             # Pydantic schemas
│   │   ├── middleware/         # Auth, CORS
│   │   └── utils/              # Logging, errors
│   ├── scripts/                # MCP test servers
│   ├── tests/                  # 54 pytest tests
│   │   └── test_services/      # Plugin registry, tool registry, bug fixes
│   └── cli/                    # CLI client
│       └── omni_cli.py         # Text-only agent in terminal
├── dashboard/                  # React 19 + Vite
│   ├── src/
│   │   ├── components/         # UI components (shadcn/ui)
│   │   ├── pages/              # Dashboard, Personas, Plugins, Sessions...
│   │   ├── stores/             # Zustand state stores
│   │   ├── hooks/              # useWebSocket, useAudioPipeline, etc.
│   │   └── lib/                # Utilities
│   └── index.html
├── Docs/                       # Architecture & planning
│   ├── UNIFIED_MULTI_CLIENT_ARCHITECTURE.md  # Master architecture doc
│   ├── DEVELOPMENT_CHECKLIST.md
│   └── ...
├── deploy/
│   ├── terraform/              # Cloud Run + Firestore + GCS + Secret Manager
│   └── scripts/
├── .env.example
└── README.md

How It Works

Voice Pipeline

User speaks → Mic → AudioWorklet (16kHz PCM)
  → Binary WebSocket frame → FastAPI backend
    → ADK LiveRequestQueue → Gemini Live API
      → Agent processes (tools, grounding, GenUI)
    ← ADK response (audio + text + structured data)
  ← Binary frame (24kHz PCM) + JSON frames (transcript, GenUI, status)
← AudioWorklet playback → Speaker

Latency target: < 500ms end-to-end

Cross-Client Actions

Phone camera → captures image → sends via WebSocket
  → Agent: "This is a book about machine learning"
  → Agent: "I've saved the analysis to your dashboard"
  → WebSocket push to dashboard client
    → GenUI card appears with image + analysis

All clients share the same session via Agent Engine Sessions

GenUI Flow

User: "Show me Tesla's stock performance"

Agent response:
  Audio: "Tesla has been on an upward trend..."
  GenUI: {
    type: "chart",
    chartType: "line",
    data: [...],
    title: "Tesla (TSLA) — 12 Month Performance"
  }

Dashboard renders Recharts component inline in chat
while audio plays simultaneously

Judging Criteria Alignment

Criterion	Weight	Our Score Target	Key Features
Innovation & Multimodal UX	40%	5/5	Multi-client hub, GenUI, cross-client actions, voice personas, MCP plugin store, browser control
Technical Implementation	30%	5/5	14 ADK features, 16+ GCP services, Agent Engine, binary audio transport, AudioWorklet pipeline
Demo & Presentation	30%	5/5	4-min scripted video with "wow" moments, architecture diagram, Cloud deployment proof
Bonus	+1.0	+1.0	Blog post (+0.6), Terraform deploy (+0.2), GDG membership (+0.2)

Blog Post

📝 How I Built a Multi-Device AI Agent Hub with Gemini Live API & Google ADK → (coming soon)

Team

Name	Role
Your Name	Full-stack developer

License

This project is built for the Gemini Live Agent Challenge hackathon.

OMNI — Speak anywhere. Act everywhere.

Built with ❤️ using Google Gemini, ADK, and 16+ Google Cloud services.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
.github/workflows		.github/workflows
.vscode		.vscode
Docs		Docs
backend		backend
blog		blog
chrome-extension(in dev)		chrome-extension(in dev)
cli(in dev)		cli(in dev)
dashboard		dashboard
deploy		deploy
desktop-client		desktop-client
site-docs		site-docs
site		site
smart-glasses		smart-glasses
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE_IMPLEMENTATION_PLAN.md		ARCHITECTURE_IMPLEMENTATION_PLAN.md
LICENSE		LICENSE
README.md		README.md
README.pdf		README.pdf
architecture.txt		architecture.txt
firebase.json		firebase.json
firestore.indexes.json		firestore.indexes.json
future.features.txt		future.features.txt
future.optimization.txt		future.optimization.txt
mermaidDG.txt		mermaidDG.txt
mkdocs.yml		mkdocs.yml

Folders and files

Latest commit

History

Repository files navigation

OMNI

Speak anywhere. Act everywhere.

The Problem

The Solution

Demo

Highlights

Architecture

3-Layer Capability-Based Agent Routing

Key Design Decisions

Tech Stack

Backend

Frontend

Google Cloud Services (16+)

Getting Started

Prerequisites

1. Clone the repository

2. Set up environment variables

3. Start the backend

4. Start the frontend

5. Open the dashboard

Deploy to Google Cloud

One-command deploy

Manual deploy

Agent Personas

MCP Plugin Store

Extension Points

Built-in Catalog

Add a New MCP Server (3 ways)

Project Structure

How It Works

Voice Pipeline

Cross-Client Actions

GenUI Flow

Judging Criteria Alignment

Blog Post

Team

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages