A Python tool for collecting and analyzing discussions from Discourse-based forums using LLM-powered analysis.
This tool automates the collection of forum data from Discourse forums (which provide JSON representations of pages) and uses Claude AI to analyze discussions, identify common problems, and extract insights. While initially built to analyze Shopify's webhook forum, it works with any publicly accessible Discourse installation.
New to this tool? It's recommended to read the Glossary to understand key terminology.
- Automated scraping via Discourse JSON endpoints
- Rate-limited HTTP client with retry logic
- Checkpoint-based recovery for interrupted operations
- Incremental updates (collect only new content)
- SQLite storage with SQLAlchemy ORM
- Problem extraction from discussion threads
- Automatic categorization by topic type
- Severity assessment (critical, high, medium, low)
- Theme identification across multiple discussions
- Natural language query interface
- Markdown reports with statistics
- Problem theme grouping
- JSON and CSV export options
- Python 3.10 or higher
- Anthropic API key (for LLM analysis features)
pip install forum-analyzergit clone https://github.com/leggetter/discourse-forum-analyzer.git
cd discourse-forum-analyzer
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e .Create a new directory for your analysis project and initialize it:
mkdir my-forum-analysis
cd my-forum-analysis
forum-analyzer initThe init command will interactively prompt you for:
- Discourse forum URL
- Category path (e.g., 't' or 'c')
- Category ID (with helpful hints; slug fetched automatically)
- Anthropic API key (optional, can be added later)
This creates a project structure:
my-forum-analysis/
├── config.yaml # Your configuration
├── forum.db # SQLite database (created on first collect)
├── checkpoints/ # Recovery checkpoints
├── exports/ # Analysis reports
└── logs/ # Application logs
The recommended workflow ensures the most accurate and relevant analysis by first discovering themes from your specific data.
# 1. Collect forum data (initializes database automatically)
forum-analyzer collect
# 2. Discover natural categories from the data
forum-analyzer themes discover --min-topics 3
# 3. Analyze all topics using the discovered categories
forum-analyzer llm-analyze
# 4. Ask questions about your analysis
forum-analyzer ask "What are the main authentication issues?"You can work with multiple forum analysis projects by using the --dir flag:
# Initialize a new project in a specific directory
mkdir shopify-webhooks
forum-analyzer --dir shopify-webhooks init
# Collect data for that project
forum-analyzer --dir shopify-webhooks collect
# Or use environment variable
export FORUM_ANALYZER_DIR=./shopify-webhooks
forum-analyzer collectA full list of commands and their options are available below.
# Initialize a new project in the current directory
forum-analyzer init
# Initialize in a specific directory
forum-analyzer --dir ./my-project init
# Overwrite existing configuration
forum-analyzer init --force# Collect from the category in your config
forum-analyzer collect
# Collect from a specific category
forum-analyzer collect --category-id 25
# Collect with a page limit (for testing)
forum-analyzer collect --page-limit 2
# Collect from a different project directory
forum-analyzer --dir ./my-project collect# Fetch only new/updated content
forum-analyzer update# View collection status and statistics
forum-analyzer status# Discover common themes (minimum 3 topics per theme)
forum-analyzer themes discover
# Analyze more topics for better pattern discovery
forum-analyzer themes discover --context-limit 100
# List themes already discovered
forum-analyzer themes list
# Delete all themes (prompts for confirmation)
forum-analyzer themes clean# Analyze all unanalyzed topics
forum-analyzer llm-analyze
# Re-analyze topics that have already been analyzed
forum-analyzer llm-analyze --force
# Analyze a specific topic by its ID
forum-analyzer llm-analyze --topic-id 66# Ask questions about the analyzed data
forum-analyzer ask "What are the most common authentication issues?"# Clear all collection checkpoints
forum-analyzer clear-checkpoints┌─────────────────────┐
│ Discourse Forum │
│ (JSON endpoints) │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Rate-Limited │
│ HTTP Client │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Checkpoint │
│ Manager │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ SQLite Database │
│ (SQLAlchemy) │
└──────────┬──────────┘
│
▼
┌─────────────────────┐ ┌──────────────┐
│ LLM Analyzer │────▶│ Claude API │
└──────────┬──────────┘ └──────────────┘
│
▼
┌─────────────────────┐
│ Reports & Themes │
└─────────────────────┘
- Language: Python 3.10+
- Database: SQLite with SQLAlchemy
- HTTP: httpx (async)
- LLM: Claude API (Anthropic)
- CLI: Click
- Config: Pydantic + YAML
discourse-forum-analyzer/
├── src/forum_analyzer/
│ ├── analyzer/ # LLM analysis
│ ├── collector/ # Data collection
│ ├── config/
│ └── cli.py
├── config/
│ └── cli.py
├── examples/
│ └── shopify-webhooks/
└── tests/
The schema is managed by SQLAlchemy models and is split into three categories:
- Forum Data Tables:
categories,topics,posts,users - Analysis Tables:
llm_analysis,problem_themes - Operational Tables:
checkpoints,fetch_history
The schema auto-migrates when using LLM analysis features.
This tool was demonstrated by analyzing Shopify's webhook discussions.
- Topics: 271
- Posts: 1,201
- Users: 324
- Date Range: September 2024 - October 2025
Example analysis results:
- 15 distinct problem themes identified
- 18 critical issues found
- Top issue: Configuration challenges (25.1% of topics)
See the complete example analysis: examples/shopify-webhooks/LLM_ANALYSIS_REPORT.md
pytestblack src/ tests/
isort src/ tests/
flake8 src/ tests/
mypy src/Rate Limiting
- Adjust
rate_limitin config.yaml (default: 1 req/sec).
This section is for maintainers who need to publish new versions of the package to PyPI.
- PyPI Account: Create an account at pypi.org
- API Token: Generate an API token from your PyPI account settings
- Build Tools: Install required packages:
pip install build twine
Store your PyPI API token in ~/.pypirc:
[pypi]
username = __token__
password = pypi-YOUR-API-TOKEN-HERE-
Update Version: Bump the version in
pyproject.tomlversion = "0.2.0" # Update this line
-
Clean Previous Builds:
rm -rf dist/ build/ *.egg-info -
Build Distribution:
python -m build
-
Upload to PyPI:
twine upload dist/* -
Verify Upload:
pip install --upgrade forum-analyzer forum-analyzer --version
- Patch (0.1.0 → 0.1.1): Bug fixes, documentation updates
- Minor (0.1.0 → 0.2.0): New features, backward-compatible changes
- Major (0.1.0 → 1.0.0): Breaking changes, major redesigns
- Package Name:
forum-analyzer - PyPI URL: https://pypi.org/project/forum-analyzer/
- Repository: https://github.com/leggetter/discourse-forum-analyzer
Database Locked
- Only one instance can run at a time.
- Clear stale checkpoints:
forum-analyzer clear-checkpoints.
LLM Analysis Errors
- Verify your Anthropic API key is valid and has credit.
- Use the
--limitflag for testing with smaller datasets.
- Fork the repository
- Create a feature branch
- Make changes with tests
- Submit a pull request
MIT License - See LICENSE file for details.
Understanding the terminology used in this tool:
Category
A top-level organizational unit in Discourse forums (e.g., "Webhooks & Events").
Topic
A discussion thread within a category.
Post
An individual message within a topic. The first post is the topic starter; subsequent posts are replies.
Classification The LLM-assigned type of problem or discussion in a topic (e.g., "webhook_delivery", "authentication").
Theme
A higher-level pattern grouping multiple related topics (e.g., "Webhook Delivery Failures").
Severity
The urgency/impact level assigned to a topic (critical, high, medium, low).
Collection
The process of downloading forum data (forum-analyzer collect).
Analysis
The process of using the LLM to extract insights from topics (forum-analyzer llm-analyze).
Theme Identification
The process of grouping topics into common patterns (forum-analyzer themes discover).