Discourse Forum Analyzer

A Python tool for collecting and analyzing discussions from Discourse-based forums using LLM-powered analysis.

Overview

This tool automates the collection of forum data from Discourse forums (which provide JSON representations of pages) and uses Claude AI to analyze discussions, identify common problems, and extract insights. While initially built to analyze Shopify's webhook forum, it works with any publicly accessible Discourse installation.

New to this tool? It's recommended to read the Glossary to understand key terminology.

Features

Data Collection

Automated scraping via Discourse JSON endpoints
Rate-limited HTTP client with retry logic
Checkpoint-based recovery for interrupted operations
Incremental updates (collect only new content)
SQLite storage with SQLAlchemy ORM

LLM Analysis

Problem extraction from discussion threads
Automatic categorization by topic type
Severity assessment (critical, high, medium, low)
Theme identification across multiple discussions
Natural language query interface

Reporting

Markdown reports with statistics
Problem theme grouping
JSON and CSV export options

Requirements

Python 3.10 or higher
Anthropic API key (for LLM analysis features)

Installation

From PyPI (Recommended)

pip install forum-analyzer

From Source (Development)

git clone https://github.com/leggetter/discourse-forum-analyzer.git
cd discourse-forum-analyzer

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

pip install -e .

Quick Start

1. Initialize a New Project

Create a new directory for your analysis project and initialize it:

mkdir my-forum-analysis
cd my-forum-analysis
forum-analyzer init

The init command will interactively prompt you for:

Discourse forum URL
Category path (e.g., 't' or 'c')
Category ID (with helpful hints; slug fetched automatically)
Anthropic API key (optional, can be added later)

This creates a project structure:

my-forum-analysis/
├── config.yaml          # Your configuration
├── forum.db            # SQLite database (created on first collect)
├── checkpoints/        # Recovery checkpoints
├── exports/            # Analysis reports
└── logs/               # Application logs

2. Recommended Workflow

The recommended workflow ensures the most accurate and relevant analysis by first discovering themes from your specific data.

# 1. Collect forum data (initializes database automatically)
forum-analyzer collect

# 2. Discover natural categories from the data
forum-analyzer themes discover --min-topics 3

# 3. Analyze all topics using the discovered categories
forum-analyzer llm-analyze

# 4. Ask questions about your analysis
forum-analyzer ask "What are the main authentication issues?"

Working with Multiple Projects

You can work with multiple forum analysis projects by using the --dir flag:

# Initialize a new project in a specific directory
mkdir shopify-webhooks
forum-analyzer --dir shopify-webhooks init

# Collect data for that project
forum-analyzer --dir shopify-webhooks collect

# Or use environment variable
export FORUM_ANALYZER_DIR=./shopify-webhooks
forum-analyzer collect

Usage

All Commands

A full list of commands and their options are available below.

Project Initialization

# Initialize a new project in the current directory
forum-analyzer init

# Initialize in a specific directory
forum-analyzer --dir ./my-project init

# Overwrite existing configuration
forum-analyzer init --force

Data Collection

# Collect from the category in your config
forum-analyzer collect

# Collect from a specific category
forum-analyzer collect --category-id 25

# Collect with a page limit (for testing)
forum-analyzer collect --page-limit 2

# Collect from a different project directory
forum-analyzer --dir ./my-project collect

Incremental Updates

# Fetch only new/updated content
forum-analyzer update

Status

# View collection status and statistics
forum-analyzer status

Theme Management

# Discover common themes (minimum 3 topics per theme)
forum-analyzer themes discover

# Analyze more topics for better pattern discovery
forum-analyzer themes discover --context-limit 100

# List themes already discovered
forum-analyzer themes list

# Delete all themes (prompts for confirmation)
forum-analyzer themes clean

Topic Analysis

# Analyze all unanalyzed topics
forum-analyzer llm-analyze

# Re-analyze topics that have already been analyzed
forum-analyzer llm-analyze --force

# Analyze a specific topic by its ID
forum-analyzer llm-analyze --topic-id 66

Querying

# Ask questions about the analyzed data
forum-analyzer ask "What are the most common authentication issues?"

Maintenance

# Clear all collection checkpoints
forum-analyzer clear-checkpoints

Technical Details

Architecture

┌─────────────────────┐
│  Discourse Forum    │
│  (JSON endpoints)   │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│   Rate-Limited      │
│   HTTP Client       │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│   Checkpoint        │
│   Manager           │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│   SQLite Database   │
│   (SQLAlchemy)      │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐     ┌──────────────┐
│   LLM Analyzer      │────▶│  Claude API  │
└──────────┬──────────┘     └──────────────┘
           │
           ▼
┌─────────────────────┐
│  Reports & Themes   │
└─────────────────────┘

Technology Stack

Language: Python 3.10+
Database: SQLite with SQLAlchemy
HTTP: httpx (async)
LLM: Claude API (Anthropic)
CLI: Click
Config: Pydantic + YAML

Project Structure

discourse-forum-analyzer/
├── src/forum_analyzer/
│   ├── analyzer/              # LLM analysis
│   ├── collector/             # Data collection
│   ├── config/
│   └── cli.py
├── config/
│   └── cli.py
├── examples/
│   └── shopify-webhooks/
└── tests/

Database Schema

The schema is managed by SQLAlchemy models and is split into three categories:

Forum Data Tables: categories, topics, posts, users
Analysis Tables: llm_analysis, problem_themes
Operational Tables: checkpoints, fetch_history

The schema auto-migrates when using LLM analysis features.

Example Application: Shopify Developer Forum

This tool was demonstrated by analyzing Shopify's webhook discussions.

Topics: 271
Posts: 1,201
Users: 324
Date Range: September 2024 - October 2025

Example analysis results:

15 distinct problem themes identified
18 critical issues found
Top issue: Configuration challenges (25.1% of topics)

See the complete example analysis: examples/shopify-webhooks/LLM_ANALYSIS_REPORT.md

Development

Running Tests

pytest

Code Quality

black src/ tests/
isort src/ tests/
flake8 src/ tests/
mypy src/

Troubleshooting

Rate Limiting

Adjust rate_limit in config.yaml (default: 1 req/sec).

Publishing to PyPI

This section is for maintainers who need to publish new versions of the package to PyPI.

Prerequisites

PyPI Account: Create an account at pypi.org
API Token: Generate an API token from your PyPI account settings
Build Tools: Install required packages:
```
pip install build twine
```

Setup API Token

Store your PyPI API token in ~/.pypirc:

[pypi]
username = __token__
password = pypi-YOUR-API-TOKEN-HERE

Build and Publish

Update Version: Bump the version in pyproject.toml
```
version = "0.2.0"  # Update this line
```
Clean Previous Builds:
```
rm -rf dist/ build/ *.egg-info
```
Build Distribution:
```
python -m build
```
Upload to PyPI:
```
twine upload dist/*
```

Verify Upload:

pip install --upgrade forum-analyzer
forum-analyzer --version

Version Bumping Strategy

Patch (0.1.0 → 0.1.1): Bug fixes, documentation updates
Minor (0.1.0 → 0.2.0): New features, backward-compatible changes
Major (0.1.0 → 1.0.0): Breaking changes, major redesigns

Package Information

Package Name: forum-analyzer
PyPI URL: https://pypi.org/project/forum-analyzer/
Repository: https://github.com/leggetter/discourse-forum-analyzer

Database Locked

Only one instance can run at a time.
Clear stale checkpoints: forum-analyzer clear-checkpoints.

LLM Analysis Errors

Verify your Anthropic API key is valid and has credit.
Use the --limit flag for testing with smaller datasets.

Contributing

Fork the repository
Create a feature branch
Make changes with tests
Submit a pull request

License

MIT License - See LICENSE file for details.

Appendix: Glossary

Understanding the terminology used in this tool:

Discourse Forum Terms

Category
A top-level organizational unit in Discourse forums (e.g., "Webhooks & Events").

Topic
A discussion thread within a category.

Post
An individual message within a topic. The first post is the topic starter; subsequent posts are replies.

Analysis Terms

Classification The LLM-assigned type of problem or discussion in a topic (e.g., "webhook_delivery", "authentication").

Theme
A higher-level pattern grouping multiple related topics (e.g., "Webhook Delivery Failures").

Severity
The urgency/impact level assigned to a topic (critical, high, medium, low).

Workflow Terms

Collection
The process of downloading forum data (forum-analyzer collect).

Analysis
The process of using the LLM to extract insights from topics (forum-analyzer llm-analyze).

Theme Identification
The process of grouping topics into common patterns (forum-analyzer themes discover).

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.plan		.plan
examples/shopify-webhooks		examples/shopify-webhooks
scripts		scripts
src/forum_analyzer		src/forum_analyzer
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Discourse Forum Analyzer

Overview

Features

Data Collection

LLM Analysis

Reporting

Requirements

Installation

From PyPI (Recommended)

From Source (Development)

Quick Start

1. Initialize a New Project

2. Recommended Workflow

Working with Multiple Projects

Usage

All Commands

Project Initialization

Data Collection

Incremental Updates

Status

Theme Management

Topic Analysis

Querying

Maintenance

Technical Details

Architecture

Technology Stack

Project Structure

Database Schema

Example Application: Shopify Developer Forum

Development

Running Tests

Code Quality

Troubleshooting

Publishing to PyPI

Prerequisites

Setup API Token

Build and Publish

Version Bumping Strategy

Package Information

Contributing

License

Appendix: Glossary

Discourse Forum Terms

Analysis Terms

Workflow Terms

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages