Skip to content

hanjq17/discrete-diffusion-sdpo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[ICLR 2026] Discrete Diffusion Trajectory Alignment via Stepwise Decomposition

Jiaqi Han* $^1$, Austin Wang* $^2$, Minkai Xu $^1$, Wenda Chu $^2$, Meihua Dang $^1$, Haotian Ye $^1$, Huayu Chen $^3$, Yisong Yue $^2$, Stefano Ermon $^1$

$^1$ Stanford University $^2$ Caltech $^3$ Tsinghua University

🎯 Overview

we propose SDPO, a general preference optimization method to approach trajectory alignment for discrete diffusion models. Importantly, we decompose the problem into a set of stepwise alignment objectives by matching the per-step factorized posterior. This framework enables efficient diffusion optimization, is compatible with arbitrary reward functions, and yields an equivalent optimal solution under additive factorization of the trajectory reward.

Experiments across multiple domains including DNA sequence design, protein inverse folding, and language modeling consistently demonstrate the superiority of our approach.

Please give us a star ⭐ if you find our work interesting!

Overview

⭐ DNA Sequence Design Experiments

Our goal here is to optimize the activity of regulatory DNA sequences such that they drive gene expression in specific cell types, a critical task for cell and gene therapy.

We provide the source code of the DNA experiments in SDPO_dna/ folder. Please refer to SDPO_dna/REAMDE.md for detailed instructions.

⭐ Protein Inverse Folding Experiments

Given a pretrained inverse folding model that generates sequences conditioned on the backbone’s conformation (3D structure), our goal is to optimize the stability of these generated sequences.

The code and instructions are in SDPO_protein/ folder. Please refer to SDPO_protein/REAMDE.md for detailed instructions.

⭐ Language Modeling Experiments

We also apply our approach to a large-scale discrete diffusion for natural language modeling, demonstrating its efficacy towards preference optimization of large language diffusion models. We employ LLaDA-8B-Instruct as the reference model.

The code is provided in SDPO_llada/ folder.

To launch the experiments:

# cd SDPO_llada/
# pip install -r requirements.txt
ngpu=4
devices=0,1,2,3
CUDA_VISIBLE_DEVICES=${devices} \
accelerate launch \
--config_file accelerate_configs/deepspeed_zero3.yaml \
--num_processes=4 \
--main_process_port=29521 \
run.py \
config_llada.yaml

Acknowledgment

We use the data and checkpoints from the DRAKES repository for the DNA and protein experiments. The code for language modeling experiments was heavily built upon Simpo. We sincerely thank the authors for open-sourcing the codebase.

📌 Citation

Please consider citing our work if you find it useful:

@inproceedings{
han2026discrete,
title={Discrete Diffusion Trajectory Alignment via Stepwise Decomposition},
author={Jiaqi Han and Austin Wang and Minkai Xu and Wenda Chu and Meihua Dang and Haotian Ye and Huayu Chen and Yisong Yue and Stefano Ermon},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=h9b5h69v3p}
}

🧩 Contact

If you have any question, welcome to contact me at:

Jiaqi Han: jiaqihan@stanford.edu

About

[ICLR 2026] Discrete Diffusion Trajectory Alignment via Stepwise Decomposition

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors