Wiki »
« Previous | Revision 2/3 (diff) | Next » jun chen, 07/26/2025 05:18 PM
从softmax到context parallell
针对超长上下文模型训练的序列并行方案简介
deepspeed-zero3 分享
Updated by jun chen 19 days ago · 3 revisions