Validate ads.txt and app-ads.txt partner declarations at scale with a Streamlit-powered, multi-threaded auditing workflow for AdOps, monetization, and supply-path verification.
Important
This project is currently implemented as a Streamlit application (not a packaged Python library), with reusable validation logic embedded in app.py.
- Features
- Tech Stack & Architecture
- Getting Started
- Testing
- Deployment
- Usage
- Configuration
- License
- Contacts & Community Support
- Bulk domain validation using
ThreadPoolExecutorfor concurrent fetch and comparison workflows. - Dual-target mode for both:
ads.txt(web inventory)app-ads.txt(in-app inventory)
- Resilient URL normalization that accepts raw hostnames and protocol-prefixed inputs.
- Multi-protocol retrieval strategy (
httpsfirst, thenhttp) with redirect handling. - SSL fallback mode (
verify=Falseretry path) for non-compliant publisher endpoints. - Soft-404 detection by identifying HTML payloads served at text-file endpoints.
- Rule-based matching engine supporting:
- Domain + Publisher ID matching
- Optional Account Type (
DIRECT/RESELLER) enforcement - Partial match diagnostics for type mismatch scenarios
- Result quality triaging with dedicated statuses (
Valid,Partially matched,Not found,Error,System Error). - Analyst-friendly filtering for error-only workflows and category-based triage.
- Two report layouts:
- Standard vertical table (row per reference check)
- Horizontal aggregated matrix (domain-centric review)
- CSV export (
UTF-8 BOM) for spreadsheet compatibility and downstream ops reporting. - Session-persistent results and input ordering via Streamlit
session_state. - Dark-theme UI with custom styling and color-coded status semantics.
Tip
Use Errors / Warnings Only mode with selective filters to reduce noise when auditing large publisher batches.
- Language: Python 3.8+
- UI Runtime: Streamlit
- Data Processing: pandas
- HTTP Client: requests
- Concurrency:
concurrent.futures.ThreadPoolExecutor - Parsing Utilities:
urllib.parse - Static Assets: PNG icon under
icons/
.
├── app.py # Streamlit UI + validation core logic
├── requirements.txt # Runtime dependencies
├── LICENSE # GNU AGPL v3 license
├── README.md # Project documentation
└── icons/
└── icon.png # Application favicon/logo
-
UI-first architecture
- The app is intentionally single-file (
app.py) to simplify distribution and onboarding. - Business logic and presentation are colocated for rapid iteration in Streamlit environments.
- The app is intentionally single-file (
-
Network fault tolerance over strict purity
- The fetch layer retries over protocol variants and includes SSL bypass fallback to maximize recoverability against real-world publisher misconfiguration.
-
Deterministic output ordering
- Results are post-sorted using the original target input sequence for analyst predictability.
-
Operationally pragmatic matching model
- A reference line is treated as valid with domain+ID, while account type remains optional and upgraded to strict validation when provided.
-
Built-in anti-noise controls
- Error classification and view filtering are first-class to support high-volume incident triage.
flowchart TD
A[User Inputs Target Domains and Reference Lines] --> B[Normalize and Parse Input]
B --> C[Spawn ThreadPoolExecutor Jobs]
C --> D[Fetch ads.txt or app-ads.txt]
D --> E{Response Valid?}
E -->|No| F[Emit Error Rows for All References]
E -->|Yes| G[Parse File Records]
G --> H[Match Domain and ID]
H --> I{Type Provided?}
I -->|No| J[Mark Valid on Domain+ID]
I -->|Yes and Match| K[Mark Full Match]
I -->|Yes and Mismatch| L[Mark Partial Match]
J --> M[Aggregate Results]
K --> M
L --> M
F --> M
M --> N[Apply View and Error Filters]
N --> O[Render DataFrame and Enable CSV Export]
Note
Thread pool concurrency is currently capped in code (MAX_WORKERS = 5) to balance throughput and endpoint friendliness.
- Python
3.8+ pip(orpip3)- Network access to target publisher domains
Optional but recommended:
- Virtual environment (
venv,virtualenv, orconda)
- Clone the repository.
git clone https://github.com/<your-org-or-user>/Ads.txt-App-ads.txt-line-Checker.git
cd Ads.txt-App-ads.txt-line-Checker- Create and activate a virtual environment.
python -m venv .venv
source .venv/bin/activate- Install dependencies.
pip install --upgrade pip
pip install -r requirements.txt- Start the Streamlit application.
streamlit run app.py- Open the local endpoint displayed by Streamlit (typically
http://localhost:8501).
Warning
If your environment intercepts SSL traffic, some remote validations may return false negatives or SSL-related errors.
This repository does not currently ship a formal tests/ suite. For functional validation, use a layered check strategy:
- Dependency sanity check
python -m pip check- Static syntax validation
python -m py_compile app.py- Runtime smoke test
streamlit run app.py- Manual scenario checks in UI
- Empty inputs trigger warning.
- Invalid reference format is ignored.
- Domain returning HTML at
/ads.txtis flagged as error. - Mismatch between expected and observed account type appears as partial match.
Caution
Because endpoint availability is external and time-dependent, deterministic integration tests require controlled fixtures or mocked HTTP responses.
- Run behind a process manager and reverse proxy for resilience.
- Use
httpstermination at the proxy layer. - Restrict egress where required by policy while preserving target domain access.
pip install -r requirements.txt
streamlit run app.py --server.port 8501 --server.address 0.0.0.0Note
No Docker assets are currently committed; the snippet below is a recommended baseline.
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]- Add a pipeline stage for:
- dependency install
py_compilevalidation- optional linting (
ruff/flake8)
- Gate merges on successful smoke checks.
- Publish container artifacts for immutable deployments.
streamlit run app.py # starts the interactive validator UI# Target Websites (one host per line)
example.com
news-site.org
app.publisher.net# Reference Lines: domain, publisher_id, account_type(optional)
google.com, pub-1234567890, DIRECT
appnexus.com, 1234, RESELLER
rubiconproject.com, 56781. Choose file mode (ads.txt or app-ads.txt).
2. Choose output scope (all results or errors only).
3. Click "Start Validation".
4. Review table output and download CSV report.
Valid: expected mapping was found.Partially matched: domain+ID found, but account type mismatch.Not found: no domain+ID pair matched.Error/System Error: fetch or runtime issue.
Tip
Use horizontal mode for executive summary reporting, and vertical mode for deep per-line diagnostics.
This project is primarily configured through runtime UI choices and a few hard-coded execution defaults.
- File Type:
ads.txtorapp-ads.txt - Output View: full output or warning/error-only output
- Layout: vertical or horizontal aggregated
- Error Filters: category-level toggles when warning/error-only mode is enabled
MAX_WORKERS = 5- Controls concurrent domain processing.
timeout=15- HTTP request timeout in seconds.
time.sleep(random.uniform(0.5, 1.5))- Per-request jitter to reduce burst pressure.
LIVE_UA = "Mozilla/5.0 ..."- User-Agent used for remote fetch requests.
- No mandatory environment variables are currently required.
- If you deploy with Streamlit server flags, manage host/port via CLI args or Streamlit config.
streamlit run app.py --server.port 8501 --server.address 0.0.0.0Important
For enterprise deployment, externalize concurrency and timeout parameters to environment variables to improve operational control.
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). See LICENSE for full terms.
If you find this tool useful, consider leaving a star on GitHub or supporting the author directly.