PaperBananaFX is a JavaFX desktop orchestration demo for a real-time AI workflow. It targets a specific bottleneck in the "AI scientist" workflow: converting method text and figure captions into publication-ready illustrations with semantic fidelity to the method and stylistic conformity to modern research-paper aesthetics.
PaperBanana is framed as more than generic image generation. The core workflow is a five-agent pipeline:
- Retriever selects structurally relevant reference examples.
- Planner converts method text and captions into a detailed figure description.
- Stylist injects academic visual design norms.
- Visualizer renders candidate figures from the plan.
- Critic evaluates outputs and drives iterative revisions.
The paper reports evaluation on PaperBananaBench, curated from NeurIPS 2025 methodology diagrams, with 292 test cases and 292 reference cases. Against a strong Nano-Banana-Pro baseline, reported gains are:
- Faithfulness: +2.8%
- Conciseness: +37.2%
- Readability: +12.9%
- Aesthetics: +6.6%
- Overall score: +17.0%
The same planning logic is also extended to statistical plots, where executable code is argued to be more faithful than pure image generation for dense quantitative visuals.
- Agent state transitions (Retriever, Planner, Stylist, Visualizer, Critic, Chat, Speech)
- Intermediate plan/style events and iterative draft previews
- Critique loop with acceptance/revision behavior
- Telemetry charts (latency, confidence, critique severity) updated every 250ms
- Voice toggles (STT/TTS/Voice-to-Voice) through config-driven provider adapters
- Export actions for PNG snapshot, SVG-like scene serialization, and a PDF-bundle placeholder zip
- Session persistence (
session.json,events.jsonl,telemetry.jsonl) for replay
- JavaFX
Service+Task:PaperBananaRealtimeServiceruns reactive and deliberative loops - JavaFX
ScheduledService:TelemetryServiceemits periodic metrics snapshots - Event-driven UI:
EventBuspublishesAppEventupdates to the JavaFX thread - FXML/controller split:
main-view.fxml+MainController - Swappable speech adapters:
SttGatewayandTtsGatewaywith runtime factory wiring - Provider-backed visualizer/critic with fallback behavior
- Disk persistence + replay loader:
SessionPersistenceServiceandSessionReplayLoader
Configure src/main/resources/com/ainpm/paperbananafx/app-config.json:
speechProvider:mockorprovidermodelProvider:mockorproviderbaseUrl: OpenAI-compatible API base URLapiKeyEnvVar: environment variable name for API keymodelName: chat model for provider-backed visualizer/critic/STT/TTS adapters
Environment variables can override this file:
PAPERBANANAFX_SPEECH_PROVIDERPAPERBANANAFX_MODEL_PROVIDERPAPERBANANAFX_BASE_URLPAPERBANANAFX_API_KEY_ENVPAPERBANANAFX_MODEL_NAMEPAPERBANANAFX_TIMEOUT_MSPAPERBANANAFX_MAX_RETRIESPAPERBANANAFX_STORAGE_DIR
.\mvnw.cmd clean javafx:run.\mvnw.cmd test- When provider mode is enabled and API credentials are present, STT/TTS adapters and visualizer/critic use real model calls.
- If provider calls fail, the system falls back to deterministic local behavior to keep the UI live.
- Session history is written to
%APPDATA%/PaperBananaFX/sessions/<sessionId>/by default for replay/loading.
