Ads.txt / App-ads.txt Line Checker

Validate ads.txt and app-ads.txt partner declarations at scale with a Streamlit-powered, multi-threaded auditing workflow for AdOps, monetization, and supply-path verification.

Important

This project is currently implemented as a Streamlit application (not a packaged Python library), with reusable validation logic embedded in app.py.

Features

Bulk domain validation using ThreadPoolExecutor for concurrent fetch and comparison workflows.
Dual-target mode for both:
- ads.txt (web inventory)
- app-ads.txt (in-app inventory)
Resilient URL normalization that accepts raw hostnames and protocol-prefixed inputs.
Multi-protocol retrieval strategy (https first, then http) with redirect handling.
SSL fallback mode (verify=False retry path) for non-compliant publisher endpoints.
Soft-404 detection by identifying HTML payloads served at text-file endpoints.
Rule-based matching engine supporting:
- Domain + Publisher ID matching
- Optional Account Type (DIRECT / RESELLER) enforcement
- Partial match diagnostics for type mismatch scenarios
Result quality triaging with dedicated statuses (Valid, Partially matched, Not found, Error, System Error).
Analyst-friendly filtering for error-only workflows and category-based triage.
Two report layouts:
- Standard vertical table (row per reference check)
- Horizontal aggregated matrix (domain-centric review)
CSV export (UTF-8 BOM) for spreadsheet compatibility and downstream ops reporting.
Session-persistent results and input ordering via Streamlit session_state.
Dark-theme UI with custom styling and color-coded status semantics.

Tip

Use Errors / Warnings Only mode with selective filters to reduce noise when auditing large publisher batches.

Tech Stack & Architecture

Core Stack

Language: Python 3.8+
UI Runtime: Streamlit
Data Processing: pandas
HTTP Client: requests
Concurrency: concurrent.futures.ThreadPoolExecutor
Parsing Utilities: urllib.parse
Static Assets: PNG icon under icons/

Project Structure

.
├── app.py                # Streamlit UI + validation core logic
├── requirements.txt      # Runtime dependencies
├── LICENSE               # GNU AGPL v3 license
├── README.md             # Project documentation
└── icons/
    └── icon.png          # Application favicon/logo

Key Design Decisions

UI-first architecture
- The app is intentionally single-file (app.py) to simplify distribution and onboarding.
- Business logic and presentation are colocated for rapid iteration in Streamlit environments.
Network fault tolerance over strict purity
- The fetch layer retries over protocol variants and includes SSL bypass fallback to maximize recoverability against real-world publisher misconfiguration.
Deterministic output ordering
- Results are post-sorted using the original target input sequence for analyst predictability.
Operationally pragmatic matching model
- A reference line is treated as valid with domain+ID, while account type remains optional and upgraded to strict validation when provided.
Built-in anti-noise controls
- Error classification and view filtering are first-class to support high-volume incident triage.

flowchart TD
    A[User Inputs Target Domains and Reference Lines] --> B[Normalize and Parse Input]
    B --> C[Spawn ThreadPoolExecutor Jobs]
    C --> D[Fetch ads.txt or app-ads.txt]
    D --> E{Response Valid?}
    E -->|No| F[Emit Error Rows for All References]
    E -->|Yes| G[Parse File Records]
    G --> H[Match Domain and ID]
    H --> I{Type Provided?}
    I -->|No| J[Mark Valid on Domain+ID]
    I -->|Yes and Match| K[Mark Full Match]
    I -->|Yes and Mismatch| L[Mark Partial Match]
    J --> M[Aggregate Results]
    K --> M
    L --> M
    F --> M
    M --> N[Apply View and Error Filters]
    N --> O[Render DataFrame and Enable CSV Export]

Note

Thread pool concurrency is currently capped in code (MAX_WORKERS = 5) to balance throughput and endpoint friendliness.

Getting Started

Prerequisites

Python 3.8+
pip (or pip3)
Network access to target publisher domains

Optional but recommended:

Virtual environment (venv, virtualenv, or conda)

Installation

Clone the repository.

git clone https://github.com/<your-org-or-user>/Ads.txt-App-ads.txt-line-Checker.git
cd Ads.txt-App-ads.txt-line-Checker

Create and activate a virtual environment.

python -m venv .venv
source .venv/bin/activate

Install dependencies.

pip install --upgrade pip
pip install -r requirements.txt

Start the Streamlit application.

streamlit run app.py

Open the local endpoint displayed by Streamlit (typically http://localhost:8501).

Warning

If your environment intercepts SSL traffic, some remote validations may return false negatives or SSL-related errors.

Testing

This repository does not currently ship a formal tests/ suite. For functional validation, use a layered check strategy:

Dependency sanity check

python -m pip check

Static syntax validation

python -m py_compile app.py

Runtime smoke test

streamlit run app.py

Manual scenario checks in UI
- Empty inputs trigger warning.
- Invalid reference format is ignored.
- Domain returning HTML at /ads.txt is flagged as error.
- Mismatch between expected and observed account type appears as partial match.

Caution

Because endpoint availability is external and time-dependent, deterministic integration tests require controlled fixtures or mocked HTTP responses.

Deployment

Production Deployment Guidelines

Run behind a process manager and reverse proxy for resilience.
Use https termination at the proxy layer.
Restrict egress where required by policy while preserving target domain access.

Minimal Build/Run Pattern

pip install -r requirements.txt
streamlit run app.py --server.port 8501 --server.address 0.0.0.0

Containerization (Example)

Note

No Docker assets are currently committed; the snippet below is a recommended baseline.

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]

CI/CD Integration Recommendations

Add a pipeline stage for:
- dependency install
- py_compile validation
- optional linting (ruff/flake8)
Gate merges on successful smoke checks.
Publish container artifacts for immutable deployments.

Usage

1) Launch the application

streamlit run app.py  # starts the interactive validator UI

2) Provide targets and references

# Target Websites (one host per line)
example.com
news-site.org
app.publisher.net

# Reference Lines: domain, publisher_id, account_type(optional)
google.com, pub-1234567890, DIRECT
appnexus.com, 1234, RESELLER
rubiconproject.com, 5678

3) Run validation and export

1. Choose file mode (ads.txt or app-ads.txt).
2. Choose output scope (all results or errors only).
3. Click "Start Validation".
4. Review table output and download CSV report.

Result Semantics

Valid: expected mapping was found.
Partially matched: domain+ID found, but account type mismatch.
Not found: no domain+ID pair matched.
Error / System Error: fetch or runtime issue.

Tip

Use horizontal mode for executive summary reporting, and vertical mode for deep per-line diagnostics.

Configuration

This project is primarily configured through runtime UI choices and a few hard-coded execution defaults.

Runtime Options in UI

File Type: ads.txt or app-ads.txt
Output View: full output or warning/error-only output
Layout: vertical or horizontal aggregated
Error Filters: category-level toggles when warning/error-only mode is enabled

Hard-Coded Operational Parameters (`app.py`)

MAX_WORKERS = 5
- Controls concurrent domain processing.
timeout=15
- HTTP request timeout in seconds.
time.sleep(random.uniform(0.5, 1.5))
- Per-request jitter to reduce burst pressure.
LIVE_UA = "Mozilla/5.0 ..."
- User-Agent used for remote fetch requests.

Environment Variables

No mandatory environment variables are currently required.
If you deploy with Streamlit server flags, manage host/port via CLI args or Streamlit config.

Streamlit Server Flags (Example)

streamlit run app.py --server.port 8501 --server.address 0.0.0.0

Important

For enterprise deployment, externalize concurrency and timeout parameters to environment variables to improve operational control.

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). See LICENSE for full terms.

Contacts & Community Support

Support the Project

If you find this tool useful, consider leaving a star on GitHub or supporting the author directly.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github		.github
.streamlit		.streamlit
icons		icons
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
app.py		app.py
requirements.txt		requirements.txt

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Ads.txt / App-ads.txt Line Checker

Table of Contents

Features

Tech Stack & Architecture

Core Stack

Project Structure

Key Design Decisions

Getting Started

Prerequisites

Installation

Testing

Deployment

Production Deployment Guidelines

Minimal Build/Run Pattern

Containerization (Example)

CI/CD Integration Recommendations

Usage

1) Launch the application

2) Provide targets and references

3) Run validation and export

Result Semantics

Configuration

Runtime Options in UI

Hard-Coded Operational Parameters (app.py)

Environment Variables

Streamlit Server Flags (Example)

License

Contacts & Community Support

Support the Project

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Hard-Coded Operational Parameters (`app.py`)

Packages