This submission fully complies with the official Warm-up Task specification.
- PaddleOCR-VL used to extract text/layout from PDFs (
src/warmup.py) - Content converted to Markdown (
output/extracted.md) - ERNIE-generated HTML website (
output/index.html, rootindex.html) fully self-contained - Deployed on GitHub Pages: https://itzamil.github.io/ernie-ai-warmup-pdf-to-website/
- Google Colab used only as OCR debugging helper. Final repo works offline.
This project completes the ERNIE AI Developer Challenge Warm-up Task on Devpost:
- Convert PDF to images
- OCR → extract text → Markdown (
output/extracted.md) - Generate static website from Markdown
- Deploy with GitHub Pages ✅ LIVE: https://itzamil.github.io/ernie-ai-warmup-pdf-to-website/
sample.pdf– Source documentoutput/extracted.md– OCR Markdown outputoutput/index.html– Generated HTMLsrc/warmup.py– PaddleOCR extractionsrc/generate_website.py– Markdown → HTML
Supporting Colab used for OCR debugging:
Project fully self-contained in repo. No external dependencies required.