Wiki »
Revision 1/3 | Next » jun chen, 07/26/2025 05:18 PM
从softmax到context parallell
针对超长上下文模型训练的序列并行方案简介
Updated by jun chen 19 days ago · 3 revisions