agent-orchestrator user persona pain point and feature analysis — product brief #910

prod-blip · 2026-04-04T14:01:09Z

prod-blip
Apr 4, 2026
Collaborator

🛠️ agent-orchestrator × indie devs — product brief

👤 User Persona

Name: Alex
Who: Solo developer / weekend experimenter. Builds side projects, contributes to OSS, tinkers with AI tooling.
Setup: Macbook, 1–3 repos, uses Claude Code or Codex sporadically. Not running a team.
Goal: Ship faster by offloading boring/repetitive coding work to agents while keeping full control over what actually merges.
Budget mindset: Pays for Anthropic/OpenAI API out of pocket. Every dollar matters. Has been surprised by a bill at least once.
Comfort level: Technically strong. Reads READMEs. But doesn't want to babysit infrastructure — they want to point an agent at a problem and come back when it's done.

Pain Points (ranked by severity)

🔴 P0 — "I have no idea what this is costing me"

CostEstimate exists in core types but is never shown on the dashboard. There is zero visibility into token spend per session, per task, or in total. For someone paying out of pocket, this is not a UX gap — it's a trust gap. They will stop using the tool after one unexpectedly large bill.

🔴 P0 — "I don't know if the agent is making progress or spinning"

The working column just shows a pulsing green dot. There's no signal of what the agent is doing, how long it's been running, or whether it's stuck in a loop. An indie dev can't stare at a terminal 24/7. They need a cheap way to answer "is this going anywhere?" without opening a tmux session.

🟠 P1 — "When the agent gets stuck, I don't know what to do"

The respond column flags that an agent needs input, but gives no context. For a production team this is fine — they have context. For an experimenter who kicked off a task 2 hours ago, it's opaque. "Needs input" tells them nothing about why or what to say.

🟠 P1 — "I can't try two different approaches at once"

The system is designed around GitHub issues. One issue = one agent. But an experimenter's mental model is "I want to try approach A and approach B on the same problem and see which one wins." There's no native way to do this. You'd have to hack it with two different issues.

🟡 P2 — "I have to set up GitHub/Linear just to use this"

The default config assumes github as the tracker. An indie dev prototyping locally doesn't always have a GitHub issue. They might just have an idea. The onboarding asks for too much infrastructure before you can get value.

🟡 P2 — "Every agent starts from scratch — it has no memory of what worked before"

Sessions are ephemeral by design. But an experimenter who's run 20 sessions on the same codebase has learned things — which prompt patterns work, which directories to avoid, which test commands to run first. None of that is captured. Every session re-discovers the same things.

🟢 P3 — "I can't compare how different prompts performed"

If you run the same task twice with different prompts, there's no way to diff the outcomes side by side. You'd have to manually read two PR diffs. For someone iterating on prompt quality, this is a dead end.

🟢 P3 — "The dashboard doesn't tell me anything useful about completed sessions"

The done column just shows "merged" or "exited." There's no summary of what the agent did, how many tokens it used, how long it took, or whether it succeeded cleanly or needed multiple CI retries. Done sessions are a black hole.

💡 Solutions (prioritised)

✅ P0-A — Live cost meter on every session card

What: Show $0.12 (or ~1.2k tokens) directly on the session card in the working/pending columns. Update in real-time as the session runs. Show a daily total in the topbar.

Why it matters: CostEstimate already exists in core types. This is a plumbing + UI problem, not a new feature. The data is there — it's just not surfaced. Fixing this alone would remove the single biggest reason an indie dev stops trusting the tool.

Discord-worthy change: People will screenshot "$0.08 to fix a CI failure" and share it. Cost transparency is a growth lever.

✅ P0-B — Agent activity summary (not just a green dot)

What: Show a rolling 1-line summary of what the agent last did — e.g. "running tests... 3 failing" or "editing src/auth/middleware.ts" — updated every ~30s. Add a session duration clock.

Why it matters: Replaces the "is it stuck?" anxiety with observable progress. The agent's own status updates (which Claude Code already emits) can be parsed and surfaced without any additional instrumentation.

Bonus: If the agent hasn't emitted a new status update in >5 min, auto-promote the card to respond with a "may be stuck" label.

✅ P1-A — Stuck reason + suggested reply

What: When a session lands in respond, show why it's waiting — parsed from the agent's last output. Pre-fill a suggested reply based on common patterns (missing env var, unclear requirement, permission issue). One-click send.

Why it matters: Reduces the cognitive load of unblocking. An experimenter who doesn't have full context on what the agent was doing shouldn't have to read a full terminal scroll to figure out what to say.

✅ P1-B — Experiment mode: fork a session with a different prompt

What: Add a "try again differently" button on any session. It re-spawns the same task in a new worktree but prompts you to edit the starting prompt first. Both sessions appear side-by-side in the board. A "compare" view diffs the two PRs.

Why it matters: This is the killer feature for experimenters. No other tool does this. It turns agent-orchestrator from a task runner into a genuine experimentation platform. The mental shift is: "I'm not just running tasks, I'm finding the best way to run tasks."

✅ P1-C — Soft budget cap per session

What: Config option maxCostUsd: 0.50 per session. When the session approaches the cap, the agent pauses and asks: "I've spent $0.45 — continue?" One click to extend, one click to kill.

Why it matters: Removes the fear of runaway spend. An indie dev will run agents more aggressively if they know there's a ceiling. This is an autonomy-enabler, not a restriction.

✅ P2-A — Free-text task mode (no GitHub issue required)

What: Allow ao spawn "refactor the auth module to use JWT" without a linked issue. The agent creates its own branch, does the work, and opens a PR. The dashboard shows it with a [no issue] badge.

Why it matters: Removes the biggest onboarding friction for solo devs. You shouldn't need a GitHub account and a project management setup to try a tool.

✅ P2-B — Codebase memory across sessions

What: A lightweight .ao-memory.md file written to the repo root after each session. Contains: what the agent learned about the codebase, which commands work, which files are sensitive, common failure patterns. New sessions read this file as context.

Why it matters: Compounds value over time. After 10 sessions, new agents are faster and make fewer mistakes. This is the indie dev equivalent of an onboarded team member — the tool gets better the more you use it.

✅ P3-A — Session post-mortem card

What: When a session moves to done, replace the blank card with a summary: files changed, tests added/fixed, tokens used, cost, time taken, CI retries needed. Collapsed by default, expandable.

Why it matters: Turns completed sessions into learning material. An experimenter can look back and understand "the JWT task cost $0.22 and needed 2 CI retries" — that's useful calibration data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent-orchestrator user persona pain point and feature analysis — product brief #910

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

agent-orchestrator user persona pain point and feature analysis — product brief #910

Uh oh!

prod-blip Apr 4, 2026 Collaborator

🛠️ agent-orchestrator × indie devs — product brief

👤 User Persona

Pain Points (ranked by severity)

🔴 P0 — "I have no idea what this is costing me"

🔴 P0 — "I don't know if the agent is making progress or spinning"

🟠 P1 — "When the agent gets stuck, I don't know what to do"

🟠 P1 — "I can't try two different approaches at once"

🟡 P2 — "I have to set up GitHub/Linear just to use this"

🟡 P2 — "Every agent starts from scratch — it has no memory of what worked before"

🟢 P3 — "I can't compare how different prompts performed"

🟢 P3 — "The dashboard doesn't tell me anything useful about completed sessions"

💡 Solutions (prioritised)

✅ P0-A — Live cost meter on every session card

✅ P0-B — Agent activity summary (not just a green dot)

✅ P1-A — Stuck reason + suggested reply

✅ P1-B — Experiment mode: fork a session with a different prompt

✅ P1-C — Soft budget cap per session

✅ P2-A — Free-text task mode (no GitHub issue required)

✅ P2-B — Codebase memory across sessions

✅ P3-A — Session post-mortem card

Replies: 0 comments

prod-blip
Apr 4, 2026
Collaborator