Skip to content

Commit dc8d903

Browse files
Add ernie image (#13432)
* Add ERNIE-Image * Update doc * Update doc * Change from Custom-Attention to Diffusers Style Attention * Change from Custom-Attention to Diffusers Style Attention * 兼容SGLang * 优化PE模块的加载与offload策略 * 更新Doc文件与config配置相关内容 * Fix官方反馈的内容 * 根据官方建议优化代码 * Update code * update * update * Apply style fixes * update * update * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
1 parent 5a9a941 commit dc8d903

14 files changed

Lines changed: 1184 additions & 0 deletions

File tree

docs/source/en/_toctree.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -350,6 +350,8 @@
350350
title: DiTTransformer2DModel
351351
- local: api/models/easyanimate_transformer3d
352352
title: EasyAnimateTransformer3DModel
353+
- local: api/models/ernie_image_transformer2d
354+
title: ErnieImageTransformer2DModel
353355
- local: api/models/flux2_transformer
354356
title: Flux2Transformer2DModel
355357
- local: api/models/flux_transformer
@@ -534,6 +536,8 @@
534536
title: DiT
535537
- local: api/pipelines/easyanimate
536538
title: EasyAnimate
539+
- local: api/pipelines/ernie_image
540+
title: ERNIE-Image
537541
- local: api/pipelines/flux
538542
title: Flux
539543
- local: api/pipelines/flux2
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# ErnieImageTransformer2DModel
14+
15+
A Transformer model for image-like data from [ERNIE-Image](https://huggingface.co/baidu/ERNIE-Image).
16+
17+
A Transformer model for image-like data from [ERNIE-Image-Turbo](https://huggingface.co/baidu/ERNIE-Image-Turbo).
18+
19+
## ErnieImageTransformer2DModel
20+
21+
[[autodoc]] ErnieImageTransformer2DModel
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Ernie-Image
14+
15+
<div class="flex flex-wrap space-x-1">
16+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
17+
</div>
18+
19+
[ERNIE-Image] is a powerful and highly efficient image generation model with 8B parameters. Currently there's only two models to be released:
20+
21+
|Model|Hugging Face|
22+
|---|---|
23+
|ERNIE-Image|https://huggingface.co/baidu/ERNIE-Image|
24+
|ERNIE-Image-Turbo|https://huggingface.co/baidu/ERNIE-Image-Turbo|
25+
26+
## ERNIE-Image
27+
28+
ERNIE-Image is designed with a relatively compact architecture and solid instruction-following capability, emphasizing parameter efficiency. Based on an 8B DiT backbone, it provides performance that is comparable in some scenarios to larger (20B+) models, while maintaining reasonable parameter efficiency. It offers a relatively stable level of performance in instruction understanding and execution, text generation (e.g., English / Chinese / Japanese), and overall stability.
29+
30+
## ERNIE-Image-Turbo
31+
32+
ERNIE-Image-Turbo is a distilled variant of ERNIE-Image, requiring only 8 NFEs (Number of Function Evaluations) and offering a more efficient alternative with relatively comparable performance to the full model in certain cases.
33+
34+
## ErnieImagePipeline
35+
36+
Use [ErnieImagePipeline] to generate images from text prompts. The pipeline supports Prompt Enhancer (PE) by default, which enhances the user’s raw prompt to improve output quality, though it may reduce instruction-following accuracy.
37+
38+
We provide a pretrained 3B-parameter PE model; however, using larger language models (e.g., Gemini or ChatGPT) for prompt enhancement may yield better results. The system prompt template is available at: https://huggingface.co/baidu/ERNIE-Image/blob/main/pe/chat_template.jinja.
39+
40+
If you prefer not to use PE, set use_pe=False.
41+
42+
```python
43+
import torch
44+
from diffusers import ErnieImagePipeline
45+
from diffusers.utils import load_image
46+
47+
pipe = ErnieImagePipeline.from_pretrained("baidu/ERNIE-Image", torch_dtype=torch.bfloat16)
48+
pipe.to("cuda")
49+
# If you are running low on GPU VRAM, you can enable offloading
50+
pipe.enable_model_cpu_offload()
51+
52+
prompt = "一只黑白相间的中华田园犬"
53+
images = pipe(
54+
prompt=prompt,
55+
height=1024,
56+
width=1024,
57+
num_inference_steps=50,
58+
guidance_scale=4.0,
59+
generator=torch.Generator("cuda").manual_seed(42),
60+
use_pe=True,
61+
).images
62+
images[0].save("ernie-image-output.png")
63+
```
64+
65+
```python
66+
import torch
67+
from diffusers import ErnieImagePipeline
68+
from diffusers.utils import load_image
69+
70+
pipe = ErnieImagePipeline.from_pretrained("baidu/ERNIE-Image-Turbo", torch_dtype=torch.bfloat16)
71+
pipe.to("cuda")
72+
# If you are running low on GPU VRAM, you can enable offloading
73+
pipe.enable_model_cpu_offload()
74+
75+
prompt = "一只黑白相间的中华田园犬"
76+
images = pipe(
77+
prompt=prompt,
78+
height=1024,
79+
width=1024,
80+
num_inference_steps=8,
81+
guidance_scale=1.0,
82+
generator=torch.Generator("cuda").manual_seed(42),
83+
use_pe=True,
84+
).images
85+
images[0].save("ernie-image-turbo-output.png")
86+
```

src/diffusers/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,7 @@
235235
"CosmosTransformer3DModel",
236236
"DiTTransformer2DModel",
237237
"EasyAnimateTransformer3DModel",
238+
"ErnieImageTransformer2DModel",
238239
"Flux2Transformer2DModel",
239240
"FluxControlNetModel",
240241
"FluxMultiControlNetModel",
@@ -527,6 +528,7 @@
527528
"EasyAnimateControlPipeline",
528529
"EasyAnimateInpaintPipeline",
529530
"EasyAnimatePipeline",
531+
"ErnieImagePipeline",
530532
"Flux2KleinKVPipeline",
531533
"Flux2KleinPipeline",
532534
"Flux2Pipeline",
@@ -1037,6 +1039,7 @@
10371039
CosmosTransformer3DModel,
10381040
DiTTransformer2DModel,
10391041
EasyAnimateTransformer3DModel,
1042+
ErnieImageTransformer2DModel,
10401043
Flux2Transformer2DModel,
10411044
FluxControlNetModel,
10421045
FluxMultiControlNetModel,
@@ -1304,6 +1307,7 @@
13041307
EasyAnimateControlPipeline,
13051308
EasyAnimateInpaintPipeline,
13061309
EasyAnimatePipeline,
1310+
ErnieImagePipeline,
13071311
Flux2KleinKVPipeline,
13081312
Flux2KleinPipeline,
13091313
Flux2Pipeline,

src/diffusers/models/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,7 @@
101101
_import_structure["transformers.transformer_cogview4"] = ["CogView4Transformer2DModel"]
102102
_import_structure["transformers.transformer_cosmos"] = ["CosmosTransformer3DModel"]
103103
_import_structure["transformers.transformer_easyanimate"] = ["EasyAnimateTransformer3DModel"]
104+
_import_structure["transformers.transformer_ernie_image"] = ["ErnieImageTransformer2DModel"]
104105
_import_structure["transformers.transformer_flux"] = ["FluxTransformer2DModel"]
105106
_import_structure["transformers.transformer_flux2"] = ["Flux2Transformer2DModel"]
106107
_import_structure["transformers.transformer_glm_image"] = ["GlmImageTransformer2DModel"]
@@ -219,6 +220,7 @@
219220
DiTTransformer2DModel,
220221
DualTransformer2DModel,
221222
EasyAnimateTransformer3DModel,
223+
ErnieImageTransformer2DModel,
222224
Flux2Transformer2DModel,
223225
FluxTransformer2DModel,
224226
GlmImageTransformer2DModel,

src/diffusers/models/transformers/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
from .transformer_cogview4 import CogView4Transformer2DModel
2626
from .transformer_cosmos import CosmosTransformer3DModel
2727
from .transformer_easyanimate import EasyAnimateTransformer3DModel
28+
from .transformer_ernie_image import ErnieImageTransformer2DModel
2829
from .transformer_flux import FluxTransformer2DModel
2930
from .transformer_flux2 import Flux2Transformer2DModel
3031
from .transformer_glm_image import GlmImageTransformer2DModel

0 commit comments

Comments
 (0)