Skip to content

OstinUA/Validate-and-Optimize-app-ads.txt-ads.txt-file

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

37 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

1. Title and Description

Ads.txt Inspector Logging Library

A production-focused Python logging and diagnostics library for validating, tracing, and operationalizing ads.txt / app-ads.txt quality workflows.

Build Status Version License: Apache-2.0 Python Coverage

Note

The current repository ships a Streamlit application (app.py) plus modular analysis/render components. The "logging library" language in this README reflects the diagnostics-centric core (inspector/analyzer.py) and log rendering pipeline (inspector/render.py).

Important

This project is optimized for operational debugging and compliance validation of seller declaration files. It is not a generic application logger for arbitrary services.

2. Table of Contents

3. Features

  • Deterministic line-by-line parsing for ads.txt-style declarations.
  • Structured anomaly classification for:
    • invalid syntax,
    • unsupported relationship type,
    • duplicate seller records.
  • Built-in diagnostics output with warning/error messages tied to source line numbers.
  • Composite duplicate keying strategy (domain + publisher_id + relationship_type) for strict de-duplication.
  • Session-driven operational workflow via Streamlit for iterative cleanup.
  • HTML-based log panel rendering with severity-specific visual cues.
  • Templated metric rendering (templates/metrics.html) for standardized UI telemetry blocks.
  • Style isolation in assets/css/styles.css and static frontend scaffolding in index.html.
  • Download-ready optimized output generation for post-validation publishing.
  • Extensible architecture with clear separation between:
    • fetch/orchestration (app.py),
    • parsing/analysis (inspector/analyzer.py),
    • rendering (inspector/render.py).

Tip

For production auditing, run validation first, then remove duplicates, then export and retain original source files for traceability.

4. Tech Stack & Architecture

Core Languages, Frameworks, and Dependencies

  • Language: Python 3.10+
  • UI Runtime: Streamlit
  • HTTP Fetching: cloudscraper
  • URL Handling: urllib.parse
  • Template/CSS Asset Strategy: file-based HTML/CSS loading

Note

The architecture intentionally uses thin orchestration in app.py with reusable analysis/render modules to reduce cognitive load and support future unit testing.

Project Structure

Expand full repository tree (relevant runtime files)
.
β”œβ”€β”€ app.py
β”œβ”€β”€ assets
β”‚   β”œβ”€β”€ css
β”‚   β”‚   └── styles.css
β”‚   └── js
β”‚       └── app.js
β”œβ”€β”€ index.html
β”œβ”€β”€ inspector
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ analyzer.py
β”‚   └── render.py
β”œβ”€β”€ templates
β”‚   β”œβ”€β”€ metrics.html
β”‚   └── result_header.html
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
└── LICENSE

Key Design Decisions

  • Parser purity: analyze_text operates purely on input text and returns deterministic metadata/stats.
  • Diagnostic-first workflow: each line is classified into semantic states (neutral, valid, error, duplicate) before any mutation.
  • Separation of concerns: analysis logic and presentation rendering are split into dedicated modules.
  • Template-backed UI fragments: reusable HTML snippets reduce coupling and simplify UI iteration.
  • Operational ergonomics: one-click actions (Comment Out Errors, Remove Duplicates) optimize manual remediation.
Architecture and logging pipeline diagram (Mermaid)
flowchart TD
    A[Domain + File Type Input] --> B[Remote Fetch via cloudscraper]
    B --> C[Raw ads.txt Content]
    C --> D[analyze_text]
    D --> E[parse_line_data per line]
    E --> F{Validation Rules}
    F -->|Valid| G[Structured Record]
    F -->|Invalid| H[Error Log Entry]
    F -->|Duplicate| I[Warning Log Entry]
    G --> J[Metrics Aggregation]
    H --> K[Rendered Log Panel]
    I --> K
    J --> L[Rendered Metrics Template]
    C --> M[Optional Transform Actions]
    M --> N[Processed Content]
    N --> O[Download Optimized File]
Loading

5. Getting Started

Prerequisites

  • Python 3.10 or newer
  • pip (latest stable recommended)
  • Git CLI
  • Optional: Docker (for containerized deployment)

Installation

git clone https://github.com/<your-org>/<your-repo>.git
cd Validate-and-Optimize-app-ads.txt-ads.txt-file
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install --upgrade pip
pip install -r requirements.txt

Run the application:

streamlit run app.py

Warning

Some publisher endpoints may be WAF-protected or rate-limited. Fetch failures do not always indicate malformed configuration.

Troubleshooting and alternative setup paths

Common Setup Issues

  • streamlit: command not found

    • Ensure virtual environment is activated.
    • Reinstall dependencies with pip install -r requirements.txt.
  • TLS/network fetch errors

    • Validate outbound network access and DNS resolution.
    • Test with known reachable domains.
  • Dependency resolution problems

    • Upgrade pip, then retry installation.
    • Pin packages manually in a lockfile if your environment requires deterministic builds.

Source-Only Execution

If you only need parser behavior, run a direct Python shell and import from inspector.analyzer without launching Streamlit.

6. Testing

This repository does not currently include a dedicated tests/ suite. Use the following validation commands:

python -m compileall app.py inspector
python -m py_compile app.py
streamlit run app.py

Recommended local quality checks:

# Optional linter (install first)
ruff check app.py inspector

# Optional formatter check
ruff format --check app.py inspector

Caution

UI behavior validation is primarily manual at this stage. If you need CI-grade confidence, add unit tests for clean_url, parse_line_data, and analyze_text first.

7. Deployment

Production Deployment Guidelines

  • Keep configuration immutable per environment.
  • Run behind a reverse proxy if exposing publicly.
  • Monitor request latency and remote fetch failure rates.
  • Preserve original input artifacts for auditability.

Build and Runtime

pip install -r requirements.txt
streamlit run app.py --server.port 8501 --server.address 0.0.0.0

Containerization Example

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
CI/CD integration checklist
  • Install dependencies from requirements.txt.
  • Execute syntax checks and optional linting.
  • Run import smoke tests for inspector.analyzer and inspector.render.
  • Publish container image or deploy app artifact.
  • Gate production rollout on successful pipeline status.

8. Usage

Basic Usage

Launch Streamlit:

streamlit run app.py

Programmatic parser usage:

from inspector.analyzer import clean_url, analyze_text

raw_text = """
google.com, pub-123, DIRECT, f08c47fec0942fa0
google.com, pub-123, DIRECT, f08c47fec0942fa0
invalid,line
""".strip()

domain = clean_url("https://example.com")  # normalize user input
lines_meta, records, stats, warnings = analyze_text(raw_text)  # run diagnostics pipeline

print(domain)
print(stats)
print(warnings)

Note

The returned warnings payload is the canonical diagnostics stream and can be forwarded to any external logging backend if needed.

Advanced Usage: integrating diagnostics into custom logging sinks
from inspector.analyzer import analyze_text
import json

content = "example.com, pub-1, DIRECT\nexample.com, pub-1, DIRECT"
_, _, _, warnings = analyze_text(content)

for event in warnings:
    # Replace print with your structured logger (e.g., structlog, ELK shipper, OpenTelemetry exporter)
    print(json.dumps({"severity": event["type"], "message": event["msg"]}))

Custom Formatters

  • Severity-to-color mapping can be adjusted in assets/css/styles.css.
  • HTML wrapper output can be extended in inspector/render.py and templates/*.html.

Edge Cases

  • Empty lines and comments are treated as neutral lines.
  • Records with fewer than three comma-separated fields are treated as syntax errors.
  • Relationship types outside DIRECT/RESELLER are treated as invalid.

9. Configuration

Runtime Configuration Surfaces

  • Streamlit page metadata in app.py.
  • Supported file type options (app-ads.txt, ads.txt).
  • Request timeout on remote fetch.
  • Allowed relationship types in inspector/analyzer.py via VALID_TYPES.

Environment Variables

No mandatory .env variables are required by default.

Tip

For production hardening, externalize timeout, default file type, and endpoint policies through environment variables and load them at startup.

Suggested exhaustive configuration schema (example)
app:
  page_title: "Ads.txt Inspector"
  layout: "wide"

network:
  request_timeout_seconds: 15
  user_agent_profile: "chrome_windows"

validation:
  allowed_relationship_types:
    - DIRECT
    - RESELLER
  duplicate_key_fields:
    - domain
    - publisher_id
    - relationship_type

output:
  default_download_filename: "app-ads-optimized.txt"
  include_warning_stream: true

logging:
  emit_json: false
  include_line_numbers: true
  severity_levels:
    error: "error"
    warning: "warning"

Example .env extension:

REQUEST_TIMEOUT_SECONDS=15
DEFAULT_FILE_TYPE=app-ads.txt
STREAMLIT_PORT=8501
STREAMLIT_ADDRESS=0.0.0.0

10. License

This project is licensed under the Apache License 2.0. See LICENSE for full legal terms and usage rights.

11. Contacts & Community Support

Support the Project

Patreon Ko-fi Boosty YouTube Telegram

If you find this tool useful, consider leaving a star on GitHub or supporting the author directly.

About

πŸ”§ Streamlit app for validating and optimizing ads.txt and app-ads.txt files. Auto-fix duplicates and syntax errors, detect malformed records, analyze account types and certification usage. Multi-source ingestion with CSV/JSON export for AdOps compliance workflows.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors