Skip to content

itzAmil/ernie-ai-warmup-pdf-to-website

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ERNIE Warm-up: PDF → Website

Compliance Statement

This submission fully complies with the official Warm-up Task specification.

  • PaddleOCR-VL used to extract text/layout from PDFs (src/warmup.py)
  • Content converted to Markdown (output/extracted.md)
  • ERNIE-generated HTML website (output/index.html, root index.html) fully self-contained
  • Deployed on GitHub Pages: https://itzamil.github.io/ernie-ai-warmup-pdf-to-website/
  • Google Colab used only as OCR debugging helper. Final repo works offline.

This project completes the ERNIE AI Developer Challenge Warm-up Task on Devpost:

  1. Convert PDF to images
  2. OCR → extract text → Markdown (output/extracted.md)
  3. Generate static website from Markdown
  4. Deploy with GitHub Pages ✅ LIVE: https://itzamil.github.io/ernie-ai-warmup-pdf-to-website/

Structure

  • sample.pdf – Source document
  • output/extracted.md – OCR Markdown output
  • output/index.html – Generated HTML
  • src/warmup.py – PaddleOCR extraction
  • src/generate_website.py – Markdown → HTML

OCR Notebook (Optional)

Supporting Colab used for OCR debugging:

Project fully self-contained in repo. No external dependencies required.

About

Warm-up task for the ERNIE AI Developer Challenge on Devpost: convert a research PDF to a static website with OCR and GitHub Pages.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors