Skip to content

hey-yulee/spec-driven-docx-template-fill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spec-driven-docx-template-fill

Spec-driven DOCX template filling for institution-issued forms.
Keep the original Word layout, fill only the intended content, and verify the output.

面向“单位/学校给定 Word 模板”的 DOCX 填充框架。
目标是只改内容,不破坏原模板格式,并且让结果可分析、可校验、可复用。

Why this project

Many real-world Word automation tasks are not “generate a new document from scratch”. They are:

  • a school training report
  • a hospital internship form
  • a company internal form
  • a government-style submission template

In these cases, users usually need three things at the same time:

  • preserve the original template layout
  • fill content according to intent
  • keep the process explainable and verifiable

This project focuses on that workflow.

很多真实场景并不是“从零生成 Word 文档”,而是:

  • 学校实训报告
  • 医院实习表单
  • 公司内部固定模板
  • 政府/机构格式化申报材料

这类场景通常同时要求:

  • 保留原模板格式
  • 按意图填充内容
  • 整个过程可解释、可校验

这个项目就是为此设计的。

Core features

  • Fill DOCX templates from template_spec.json
  • Preserve formatting by cloning paragraph styles from the template or a reference document
  • Support multiple locator strategies: paraId, text_anchor, bookmark, content_control, xpath
  • Separate analysis, filling, and verification
  • Support blank template + reference template workflows for long-form sections

Why reference templates matter

Long fields in real Word forms often do not exist as ready-made paragraphs in the blank template. Instead, the blank template only contains a heading or a placeholder, while the filled version shows:

  • how many paragraphs the section should use
  • what indentation and spacing look like
  • which run style should be cloned

That is why this project supports a reference_source in addition to template_source.

在真实 Word 模板里,长文本字段常常并不是空白模板里天然就存在的一整组段落。 空白模板里往往只有标题或一个占位段,真正的段落数量、缩进、行距、运行样式,只能从参考样例里看出来。

所以这里把:

  • template_source 作为输出基底
  • reference_source 作为样式与长字段结构参考

两者一起纳入框架。

Installation

pip install -e .[dev]

CLI

docx-template-analyze --spec examples/minimal_demo/template_spec.json --output analysis.json
docx-template-fill examples/minimal_demo/template_spec.json examples/minimal_demo/fill_input.json --output output.docx --report render-report.json
docx-template-verify examples/minimal_demo/template_spec.json --output-docx output.docx --render-report render-report.json

Quick start

1. Minimal demo

docx-template-fill examples/minimal_demo/template_spec.json examples/minimal_demo/fill_input.json --output examples/minimal_demo/output.docx --report examples/minimal_demo/render-report.json
docx-template-verify examples/minimal_demo/template_spec.json --output-docx examples/minimal_demo/output.docx --render-report examples/minimal_demo/render-report.json

2. Huashang-style sanitized demo

python examples/huashang_demo/resolve_and_fill.py HV-2026-017 "Digital Vital Sign Observation" "Clinical Practice Design" --output examples/huashang_demo/output.docx

Or fill from explicit JSON:

docx-template-fill examples/huashang_demo/template_spec.json examples/huashang_demo/fill_input.json --output examples/huashang_demo/output.docx --report examples/huashang_demo/render-report.json

Public API

The main Python entry points are:

  • docx_template_fill.analyze_template
  • docx_template_fill.fill_template
  • docx_template_fill.verify_spec
  • docx_template_fill.verify_rendered_output

See:

Repository layout

src/docx_template_fill/      # core package
tests/                       # pytest suite
docs/                        # public docs
examples/minimal_demo/       # smallest runnable demo
examples/huashang_demo/      # sanitized school-style demo
scripts/generate_demo_assets.py

What is intentionally not included

  • no real school templates
  • no real student or teacher data
  • no real internal spreadsheets
  • no claim of full DOCX editing coverage

This repository is meant to open-source the framework, not any private institution materials.

Current scope

v0.1.0 intentionally focuses on:

  • spec-driven field mapping
  • style-preserving paragraph rendering
  • output verification
  • reusable demos

It does not promise full support for every DOCX feature or every possible Word control pattern yet.

About

Spec-driven DOCX template filling for institution-issued Word forms, with style-preserving rendering, reference-template cloning, and output verification.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages