You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Name: Alex Who: Solo developer / weekend experimenter. Builds side projects, contributes to OSS, tinkers with AI tooling. Setup: Macbook, 1–3 repos, uses Claude Code or Codex sporadically. Not running a team. Goal: Ship faster by offloading boring/repetitive coding work to agents while keeping full control over what actually merges. Budget mindset: Pays for Anthropic/OpenAI API out of pocket. Every dollar matters. Has been surprised by a bill at least once. Comfort level: Technically strong. Reads READMEs. But doesn't want to babysit infrastructure — they want to point an agent at a problem and come back when it's done.
Pain Points (ranked by severity)
🔴 P0 — "I have no idea what this is costing me"
CostEstimate exists in core types but is never shown on the dashboard. There is zero visibility into token spend per session, per task, or in total. For someone paying out of pocket, this is not a UX gap — it's a trust gap. They will stop using the tool after one unexpectedly large bill.
🔴 P0 — "I don't know if the agent is making progress or spinning"
The working column just shows a pulsing green dot. There's no signal of what the agent is doing, how long it's been running, or whether it's stuck in a loop. An indie dev can't stare at a terminal 24/7. They need a cheap way to answer "is this going anywhere?" without opening a tmux session.
🟠 P1 — "When the agent gets stuck, I don't know what to do"
The respond column flags that an agent needs input, but gives no context. For a production team this is fine — they have context. For an experimenter who kicked off a task 2 hours ago, it's opaque. "Needs input" tells them nothing about why or what to say.
🟠 P1 — "I can't try two different approaches at once"
The system is designed around GitHub issues. One issue = one agent. But an experimenter's mental model is "I want to try approach A and approach B on the same problem and see which one wins." There's no native way to do this. You'd have to hack it with two different issues.
🟡 P2 — "I have to set up GitHub/Linear just to use this"
The default config assumes github as the tracker. An indie dev prototyping locally doesn't always have a GitHub issue. They might just have an idea. The onboarding asks for too much infrastructure before you can get value.
🟡 P2 — "Every agent starts from scratch — it has no memory of what worked before"
Sessions are ephemeral by design. But an experimenter who's run 20 sessions on the same codebase has learned things — which prompt patterns work, which directories to avoid, which test commands to run first. None of that is captured. Every session re-discovers the same things.
🟢 P3 — "I can't compare how different prompts performed"
If you run the same task twice with different prompts, there's no way to diff the outcomes side by side. You'd have to manually read two PR diffs. For someone iterating on prompt quality, this is a dead end.
🟢 P3 — "The dashboard doesn't tell me anything useful about completed sessions"
The done column just shows "merged" or "exited." There's no summary of what the agent did, how many tokens it used, how long it took, or whether it succeeded cleanly or needed multiple CI retries. Done sessions are a black hole.
💡 Solutions (prioritised)
✅ P0-A — Live cost meter on every session card
What: Show $0.12 (or ~1.2k tokens) directly on the session card in the working/pending columns. Update in real-time as the session runs. Show a daily total in the topbar.
Why it matters:CostEstimate already exists in core types. This is a plumbing + UI problem, not a new feature. The data is there — it's just not surfaced. Fixing this alone would remove the single biggest reason an indie dev stops trusting the tool.
Discord-worthy change: People will screenshot "$0.08 to fix a CI failure" and share it. Cost transparency is a growth lever.
✅ P0-B — Agent activity summary (not just a green dot)
What: Show a rolling 1-line summary of what the agent last did — e.g. "running tests... 3 failing" or "editing src/auth/middleware.ts" — updated every ~30s. Add a session duration clock.
Why it matters: Replaces the "is it stuck?" anxiety with observable progress. The agent's own status updates (which Claude Code already emits) can be parsed and surfaced without any additional instrumentation.
Bonus: If the agent hasn't emitted a new status update in >5 min, auto-promote the card to respond with a "may be stuck" label.
✅ P1-A — Stuck reason + suggested reply
What: When a session lands in respond, show why it's waiting — parsed from the agent's last output. Pre-fill a suggested reply based on common patterns (missing env var, unclear requirement, permission issue). One-click send.
Why it matters: Reduces the cognitive load of unblocking. An experimenter who doesn't have full context on what the agent was doing shouldn't have to read a full terminal scroll to figure out what to say.
✅ P1-B — Experiment mode: fork a session with a different prompt
What: Add a "try again differently" button on any session. It re-spawns the same task in a new worktree but prompts you to edit the starting prompt first. Both sessions appear side-by-side in the board. A "compare" view diffs the two PRs.
Why it matters: This is the killer feature for experimenters. No other tool does this. It turns agent-orchestrator from a task runner into a genuine experimentation platform. The mental shift is: "I'm not just running tasks, I'm finding the best way to run tasks."
✅ P1-C — Soft budget cap per session
What: Config option maxCostUsd: 0.50 per session. When the session approaches the cap, the agent pauses and asks: "I've spent $0.45 — continue?" One click to extend, one click to kill.
Why it matters: Removes the fear of runaway spend. An indie dev will run agents more aggressively if they know there's a ceiling. This is an autonomy-enabler, not a restriction.
✅ P2-A — Free-text task mode (no GitHub issue required)
What: Allow ao spawn "refactor the auth module to use JWT" without a linked issue. The agent creates its own branch, does the work, and opens a PR. The dashboard shows it with a [no issue] badge.
Why it matters: Removes the biggest onboarding friction for solo devs. You shouldn't need a GitHub account and a project management setup to try a tool.
✅ P2-B — Codebase memory across sessions
What: A lightweight .ao-memory.md file written to the repo root after each session. Contains: what the agent learned about the codebase, which commands work, which files are sensitive, common failure patterns. New sessions read this file as context.
Why it matters: Compounds value over time. After 10 sessions, new agents are faster and make fewer mistakes. This is the indie dev equivalent of an onboarded team member — the tool gets better the more you use it.
✅ P3-A — Session post-mortem card
What: When a session moves to done, replace the blank card with a summary: files changed, tests added/fixed, tokens used, cost, time taken, CI retries needed. Collapsed by default, expandable.
Why it matters: Turns completed sessions into learning material. An experimenter can look back and understand "the JWT task cost $0.22 and needed 2 CI retries" — that's useful calibration data.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
🛠️ agent-orchestrator × indie devs — product brief
👤 User Persona
Name: Alex
Who: Solo developer / weekend experimenter. Builds side projects, contributes to OSS, tinkers with AI tooling.
Setup: Macbook, 1–3 repos, uses Claude Code or Codex sporadically. Not running a team.
Goal: Ship faster by offloading boring/repetitive coding work to agents while keeping full control over what actually merges.
Budget mindset: Pays for Anthropic/OpenAI API out of pocket. Every dollar matters. Has been surprised by a bill at least once.
Comfort level: Technically strong. Reads READMEs. But doesn't want to babysit infrastructure — they want to point an agent at a problem and come back when it's done.
Pain Points (ranked by severity)
🔴 P0 — "I have no idea what this is costing me"
CostEstimateexists in core types but is never shown on the dashboard. There is zero visibility into token spend per session, per task, or in total. For someone paying out of pocket, this is not a UX gap — it's a trust gap. They will stop using the tool after one unexpectedly large bill.🔴 P0 — "I don't know if the agent is making progress or spinning"
The
workingcolumn just shows a pulsing green dot. There's no signal of what the agent is doing, how long it's been running, or whether it's stuck in a loop. An indie dev can't stare at a terminal 24/7. They need a cheap way to answer "is this going anywhere?" without opening a tmux session.🟠 P1 — "When the agent gets stuck, I don't know what to do"
The
respondcolumn flags that an agent needs input, but gives no context. For a production team this is fine — they have context. For an experimenter who kicked off a task 2 hours ago, it's opaque. "Needs input" tells them nothing about why or what to say.🟠 P1 — "I can't try two different approaches at once"
The system is designed around GitHub issues. One issue = one agent. But an experimenter's mental model is "I want to try approach A and approach B on the same problem and see which one wins." There's no native way to do this. You'd have to hack it with two different issues.
🟡 P2 — "I have to set up GitHub/Linear just to use this"
The default config assumes
githubas the tracker. An indie dev prototyping locally doesn't always have a GitHub issue. They might just have an idea. The onboarding asks for too much infrastructure before you can get value.🟡 P2 — "Every agent starts from scratch — it has no memory of what worked before"
Sessions are ephemeral by design. But an experimenter who's run 20 sessions on the same codebase has learned things — which prompt patterns work, which directories to avoid, which test commands to run first. None of that is captured. Every session re-discovers the same things.
🟢 P3 — "I can't compare how different prompts performed"
If you run the same task twice with different prompts, there's no way to diff the outcomes side by side. You'd have to manually read two PR diffs. For someone iterating on prompt quality, this is a dead end.
🟢 P3 — "The dashboard doesn't tell me anything useful about completed sessions"
The
donecolumn just shows "merged" or "exited." There's no summary of what the agent did, how many tokens it used, how long it took, or whether it succeeded cleanly or needed multiple CI retries. Done sessions are a black hole.💡 Solutions (prioritised)
✅ P0-A — Live cost meter on every session card
What: Show
$0.12(or~1.2k tokens) directly on the session card in the working/pending columns. Update in real-time as the session runs. Show a daily total in the topbar.Why it matters:
CostEstimatealready exists in core types. This is a plumbing + UI problem, not a new feature. The data is there — it's just not surfaced. Fixing this alone would remove the single biggest reason an indie dev stops trusting the tool.Discord-worthy change: People will screenshot "$0.08 to fix a CI failure" and share it. Cost transparency is a growth lever.
✅ P0-B — Agent activity summary (not just a green dot)
What: Show a rolling 1-line summary of what the agent last did — e.g.
"running tests... 3 failing"or"editing src/auth/middleware.ts"— updated every ~30s. Add a session duration clock.Why it matters: Replaces the "is it stuck?" anxiety with observable progress. The agent's own status updates (which Claude Code already emits) can be parsed and surfaced without any additional instrumentation.
Bonus: If the agent hasn't emitted a new status update in >5 min, auto-promote the card to
respondwith a "may be stuck" label.✅ P1-A — Stuck reason + suggested reply
What: When a session lands in
respond, show why it's waiting — parsed from the agent's last output. Pre-fill a suggested reply based on common patterns (missing env var, unclear requirement, permission issue). One-click send.Why it matters: Reduces the cognitive load of unblocking. An experimenter who doesn't have full context on what the agent was doing shouldn't have to read a full terminal scroll to figure out what to say.
✅ P1-B — Experiment mode: fork a session with a different prompt
What: Add a "try again differently" button on any session. It re-spawns the same task in a new worktree but prompts you to edit the starting prompt first. Both sessions appear side-by-side in the board. A "compare" view diffs the two PRs.
Why it matters: This is the killer feature for experimenters. No other tool does this. It turns agent-orchestrator from a task runner into a genuine experimentation platform. The mental shift is: "I'm not just running tasks, I'm finding the best way to run tasks."
✅ P1-C — Soft budget cap per session
What: Config option
maxCostUsd: 0.50per session. When the session approaches the cap, the agent pauses and asks: "I've spent $0.45 — continue?" One click to extend, one click to kill.Why it matters: Removes the fear of runaway spend. An indie dev will run agents more aggressively if they know there's a ceiling. This is an autonomy-enabler, not a restriction.
✅ P2-A — Free-text task mode (no GitHub issue required)
What: Allow
ao spawn "refactor the auth module to use JWT"without a linked issue. The agent creates its own branch, does the work, and opens a PR. The dashboard shows it with a[no issue]badge.Why it matters: Removes the biggest onboarding friction for solo devs. You shouldn't need a GitHub account and a project management setup to try a tool.
✅ P2-B — Codebase memory across sessions
What: A lightweight
.ao-memory.mdfile written to the repo root after each session. Contains: what the agent learned about the codebase, which commands work, which files are sensitive, common failure patterns. New sessions read this file as context.Why it matters: Compounds value over time. After 10 sessions, new agents are faster and make fewer mistakes. This is the indie dev equivalent of an onboarded team member — the tool gets better the more you use it.
✅ P3-A — Session post-mortem card
What: When a session moves to
done, replace the blank card with a summary: files changed, tests added/fixed, tokens used, cost, time taken, CI retries needed. Collapsed by default, expandable.Why it matters: Turns completed sessions into learning material. An experimenter can look back and understand "the JWT task cost $0.22 and needed 2 CI retries" — that's useful calibration data.
Beta Was this translation helpful? Give feedback.
All reactions