v0.2.4 is here! Thanks to everyone who contributed to this release.
Major Updates
In addition to a broad set of bug fixes and stability improvements, v0.2.4 brings several major updates:
- Profiling and observability improvements
Added a rollout trace timeline viewer and W&B reporting for dynamic ITL / TTFT percentile metrics. - Router stack unified on sgl-router
Consolidated the router stack onto sgl-router and removed slime-router. - Expanded multimodal and model support
Improved support for GLM-4.6V / GLM4V, Multimodal OPD, and Qwen3.5-related workflows.
Other Notable Changes
- Fixed CUDA IPC cache leaks during weight updates
- Fixed SP/CP gradient inflation in FLA layers
What's Changed
- feat: add GLM-4.6V MoE VL bridge with CP support by @zhuzilin in #1715
- fix: resolve rope_theta from rope_parameters dict in HF config validation by @zhuzilin in #1720
- [docker] patches for glm4.6v, kimi k2.5 and dsa cp only by @zhuzilin in #1722
- Fix CUDA IPC cache leaks during weight updates by @zhuzilin in #1731
- [docker] update megatron by @zhuzilin in #1729
- [docker] Fix IndexCache with mla model by @zhuzilin in #1736
- [slime-router] support pd disaggregation and remove radix tree middleware by @zhuzilin in #1735
- Fix glm4v megatron bridge by @zhuzilin in #1738
- [docker] update sglang patch by @zhuzilin in #1743
- feat: GLM4V multimodal support improvements by @zhuzilin in #1745
- feat: placeholder worker type, metrics router, and GPQA letter range by @zhuzilin in #1746
- always enable_metrics and remove dp context by @zhuzilin in #1747
- fix: resolve SP/CP gradient inflation in FLA (linear attention) layers by @zhuzilin in #1748
- Update MTP example configs, rename GLM-4.5 to GLM-4.7, clean scripts by @zhuzilin in #1749
- Support qwen3.5 loss mask for multi-turn SFT by @huang3eng in #1742
- fix: propagate moe_token_dispatcher_type in bridge model provider by @nanjiangwill in #1737
- fix: resolve rope_theta from rope_parameters in DeepseekV32Bridge by @stevewx in #1734
- chore: translate remaining Chinese comments to English by @WangHong-yang in #1726
- feat: add Qwen3.5-4B model support by @shihaohou in #1721
- fix: http_utils. disable system proxy for internal SGLang httpx clients by @DongzhuoranZhou in #1714
- fix: auto-detect GPUs in qwen3-4b script by @ailuntz in #1700
- fix: quote
$MOE_LAYER_FREQby @lawrence-harmonic in #1689 - disable router health_check and allow prompt_data is None by @zhuzilin in #1751
- small fix on qwen3-235b-a22b launch script by @Zhuohao-Li in #1719
- sync internal bugfix by @zhuzilin in #1765
- Fix uploading sglang metrics to wandb by @zhuzilin in #1768
- use zhuzilin/sgl-router for sglang-router by @zhuzilin in #1770
- [docker] update sgl-router by @zhuzilin in #1772
- [Multimodal] Add Multimodal OPD support by @coding-famer in #1760
- refactor: remove slime router by @zhuzilin in #1773
- Add rollout trace timeline viewer by @zhuzilin in #1776
- [Fix] Fix duplicate Megatron LR scheduler resume when optimizer state is not loaded by @kaysonyu in #1775
- Support FP8 conversion for Qwen3.5 by @peterjc123 in #1769
- fix typo by @albaNnaksqr in #1759
- [Fix]Fix some bugs/clean up by @coding-famer in #1756
- (fix):not have encoder_only attr cause run failed by @wangyufak in #1741
New Contributors
- @stevewx made their first contribution in #1734
- @WangHong-yang made their first contribution in #1726
- @shihaohou made their first contribution in #1721
- @DongzhuoranZhou made their first contribution in #1714
- @ailuntz made their first contribution in #1700
- @peterjc123 made their first contribution in #1769
- @albaNnaksqr made their first contribution in #1759
- @wangyufak made their first contribution in #1741
Full Changelog: v0.2.3...v0.2.4