Skip to content

ai-in-pm/PaperBananaFX

Repository files navigation

PaperBananaFX

PaperBananaFX interface

PaperBananaFX is a JavaFX desktop orchestration demo for a real-time AI workflow. It targets a specific bottleneck in the "AI scientist" workflow: converting method text and figure captions into publication-ready illustrations with semantic fidelity to the method and stylistic conformity to modern research-paper aesthetics.

PaperBanana research context

PaperBanana is framed as more than generic image generation. The core workflow is a five-agent pipeline:

  • Retriever selects structurally relevant reference examples.
  • Planner converts method text and captions into a detailed figure description.
  • Stylist injects academic visual design norms.
  • Visualizer renders candidate figures from the plan.
  • Critic evaluates outputs and drives iterative revisions.

The paper reports evaluation on PaperBananaBench, curated from NeurIPS 2025 methodology diagrams, with 292 test cases and 292 reference cases. Against a strong Nano-Banana-Pro baseline, reported gains are:

  • Faithfulness: +2.8%
  • Conciseness: +37.2%
  • Readability: +12.9%
  • Aesthetics: +6.6%
  • Overall score: +17.0%

The same planning logic is also extended to statistical plots, where executable code is argued to be more faithful than pure image generation for dense quantitative visuals.

What this MVP shows live

  • Agent state transitions (Retriever, Planner, Stylist, Visualizer, Critic, Chat, Speech)
  • Intermediate plan/style events and iterative draft previews
  • Critique loop with acceptance/revision behavior
  • Telemetry charts (latency, confidence, critique severity) updated every 250ms
  • Voice toggles (STT/TTS/Voice-to-Voice) through config-driven provider adapters
  • Export actions for PNG snapshot, SVG-like scene serialization, and a PDF-bundle placeholder zip
  • Session persistence (session.json, events.jsonl, telemetry.jsonl) for replay

Architecture highlights

  • JavaFX Service + Task: PaperBananaRealtimeService runs reactive and deliberative loops
  • JavaFX ScheduledService: TelemetryService emits periodic metrics snapshots
  • Event-driven UI: EventBus publishes AppEvent updates to the JavaFX thread
  • FXML/controller split: main-view.fxml + MainController
  • Swappable speech adapters: SttGateway and TtsGateway with runtime factory wiring
  • Provider-backed visualizer/critic with fallback behavior
  • Disk persistence + replay loader: SessionPersistenceService and SessionReplayLoader

Provider configuration

Configure src/main/resources/com/ainpm/paperbananafx/app-config.json:

  • speechProvider: mock or provider
  • modelProvider: mock or provider
  • baseUrl: OpenAI-compatible API base URL
  • apiKeyEnvVar: environment variable name for API key
  • modelName: chat model for provider-backed visualizer/critic/STT/TTS adapters

Environment variables can override this file:

  • PAPERBANANAFX_SPEECH_PROVIDER
  • PAPERBANANAFX_MODEL_PROVIDER
  • PAPERBANANAFX_BASE_URL
  • PAPERBANANAFX_API_KEY_ENV
  • PAPERBANANAFX_MODEL_NAME
  • PAPERBANANAFX_TIMEOUT_MS
  • PAPERBANANAFX_MAX_RETRIES
  • PAPERBANANAFX_STORAGE_DIR

Run

.\mvnw.cmd clean javafx:run

Test

.\mvnw.cmd test

Notes

  • When provider mode is enabled and API credentials are present, STT/TTS adapters and visualizer/critic use real model calls.
  • If provider calls fail, the system falls back to deterministic local behavior to keep the UI live.
  • Session history is written to %APPDATA%/PaperBananaFX/sessions/<sessionId>/ by default for replay/loading.

About

PaperBananaFX is a JavaFX desktop orchestration demo for a real-time AI workflow. It targets a specific bottleneck in the "AI scientist" workflow: converting method text and figure captions into publication-ready illustrations with semantic fidelity to the method and stylistic conformity to modern research-paper aesthetics.

Topics

Resources

License

Stars

Watchers

Forks

Packages