Add modular pipeline for HunyuanVideo 1.5#13389
Conversation
|
hey guys @yiyixuxu @sayakpaul would greatly appreciate a review! |
|
@claude could you do an initial review here? |
|
I'll analyze this and get back to you. |
|
@sayakpaul looks like the Claude bot run failed on this one |
|
friendly ping @sayakpaul @yiyixuxu , awaiting review for modular hunyuanvideo 1.5. Will greatly appreciate a review! |
|
Hey guys this is on more of my contribution to modular diffusers @yiyixuxu @sayakpaul |
|
@akshan-main |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@yiyixuxu I think these tests are always failing; they have nothing to do with my PR. Anyways I was wondering if you'd like me to work on something else within or outside modular diffusers. Will be glad to contribute or test out any pipelines. |
|
@akshan-main |
|
@claude can you help to do a review here? please pay addition attention to:
|
|
Claude finished @yiyixuxu's task in 6m 50s —— View job Review: Modular HunyuanVideo 1.5 Pipeline
Overall the implementation follows the modular pipeline conventions well — block structure, file layout, guider abstraction, and denoise loop patterns are all correct and consistent with wan/flux2/qwen. Parity is verified at MAD 0.000000 for both T2V and I2V, which is great. Below are the findings, organized by severity. Issues to fix1. Missing These classes are exported from the local
This means 2. Test model path uses personal repo ( pretrained_model_name_or_path = "akshan-main/tiny-hunyuanvideo1_5-modular-pipe"Per modular.md gotcha #9: "Tiny test models must live under 3. The input is declared at line 102: InputParam.template("num_images_per_prompt", name="num_videos_per_prompt"),But the 4. Encoder accepts pre-computed embeddings (modular.md gotcha #4) (
Per modular.md:
The Wan text encoder, for comparison, only accepts Suggestions (non-blocking)5. The block accesses 6. Many The docstrings for 7. Code duplication between T2V and I2V denoisers (
Block-by-block audit summary
|
|
@yiyixuxu will address 3 and 4. skipping 1 since earlier feedback on the LTX PR was to only export AutoBlocks from top-level. For 2, can akshan-main/tiny-hunyuanvideo1_5-modular-pipe be transferred? |
| "HunyuanVideo15Blocks", | ||
| "HunyuanVideo15Image2VideoBlocks", |
There was a problem hiding this comment.
| "HunyuanVideo15Blocks", | |
| "HunyuanVideo15Image2VideoBlocks", |
| HunyuanVideo15Blocks, | ||
| HunyuanVideo15Image2VideoBlocks, |
There was a problem hiding this comment.
| HunyuanVideo15Blocks, | |
| HunyuanVideo15Image2VideoBlocks, |
|
|
||
| @property | ||
| def expected_components(self) -> list[ComponentSpec]: | ||
| return [ComponentSpec("transformer", HunyuanVideo15Transformer3DModel)] |
There was a problem hiding this comment.
indeed missing a video_processor here #13389 (comment)
|
@akshan-main sounds good, let's try to address #6 too |
on it |

What does this PR do?
Adds modular pipeline blocks for HunyuanVideo 1.5 with both text-to-video (
HunyuanVideo15Blocks) and image-to-video (HunyuanVideo15Image2VideoBlocks).Parity verified on Colab G4 GPU:
HunyuanVideo15Pipelinehv15_t2v_standard.mp4
hv15_t2v_modular.mp4
T2V reproduction code
HunyuanVideo15ImageToVideoPipelinehv15_i2v_standard.mp4
hv15_i2v_modular.mp4
I2V reproduction code
Addresses #13295 (HunyuanVideo 1.5 contribution)
Before submitting
Who can review?
@sayakpaul @yiyixuxu @asomoza