System
an archive of posts with this tag
Aug 13, 2024 | 支持变长序列的Mamba-1训练 |
---|---|
Jul 03, 2024 | 大模型的高效训练:从Infra到框架优化 |
Jul 03, 2024 | 由Ring-Attention性能问题引发的计算通信overlap分析 |
Jun 29, 2024 | 由A800平台训练InternLM-7B无法收敛引发的思考 |
Apr 19, 2024 | Understanding the Workload Characteristics of Large Language Model Development |