publications
2024
- arxiv
- SOSPLoongGen: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelismthe 30th ACM Symposium on Operating Systems Principles (SOSP 2024), 2024
- arxivLoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context ParallelismarXiv preprint arXiv:2406.18485, 2024
- ICSYmir: A Scheduler for Foundation Model Fine-tuning Workloads in DatacentersIn Proceedings of the 38th ACM International Conference on Supercomputing , 2024
- ICSAutoSched: An Adaptive Self-configured Framework for Scheduling Deep Learning Training WorkloadsIn Proceedings of the 38th ACM International Conference on Supercomputing , 2024
- arxiv
- IWQoSLins: Reducing Communication Overhead of ZeRO for Efficient LLM Training2024
- ASPLOSCentauri: Enabling Efficient Scheduling for Communication-Computation Overlap in Large Model Training via Communication PartitioningIn Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems , 2024
- arXivInternlm2 Technical ReportarXiv preprint arXiv:2403.17297, 2024
- arXivInternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant ShardingarXiv preprint arXiv:2401.09149, 2024
- WWWFedDSE: Distribution-aware Sub-model Extraction for Federated Learning over Resource-constrained DevicesIn Proceedings of the ACM on Web Conference 2024 , 2024
- TCUniSched: A Unified Scheduler for Deep Learning Training Jobs with Different User DemandsIEEE Transactions on Computers, 2024
- CSURDeep Learning Workload Scheduling in GPU Datacenters: A SurveyACM Comput. Surv., 2024
- NSDICharacterization of Large Language Model Development in the DatacenterIn 21st USENIX Symposium on Networked Systems Design and Implementation, NSDI 2024, Santa Clara, CA, April 15-17, 2024 , 2024
2023
- ASPLOSLucid: A Non-intrusive, Scalable and Interpretable Scheduler for Deep Learning Training JobsIn Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS 2023, Vancouver, BC, Canada, March 25-29, 2023 , 2023
- OSDIHydro: Surrogate-Based Hyperparameter Tuning Service in DatacentersIn 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2023, Boston, MA, USA, July 10-12, 2023 , 2023
- Boosting distributed full-graph gnn training with asynchronous one-bit communicationarXiv preprint arXiv:2303.01277, 2023
2022
- TBDGradientFlow: Optimizing Network Performance for Large-Scale Distributed DNN TrainingIEEE Trans. Big Data, 2022
- TPDSAstraea: A Fair Deep Learning Scheduler for Multi-Tenant GPU ClustersIEEE Trans. Parallel Distributed Syst., 2022
- SoCCTitan: a scheduler for foundation model fine-tuning workloadsIn Proceedings of the 13th Symposium on Cloud Computing, SoCC 2022, San Francisco, California, November 7-11, 2022 , 2022
- ATCPrimo: Practical Learning-Augmented Systems with Interpretable ModelsIn 2022 USENIX Annual Technical Conference, USENIX ATC 2022, Carlsbad, CA, USA, July 11-13, 2022 , 2022
2021
- SoCCChronus: A Novel Deadline-aware Scheduler for Deep Learning Training JobsIn SoCC ’21: ACM Symposium on Cloud Computing, Seattle, WA, USA, November 1 - 4, 2021 , 2021
- SCCharacterization and prediction of deep learning workloads in large-scale GPU datacentersIn International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2021, St. Louis, Missouri, USA, November 14-19, 2021 , 2021
2020
- TBDGraphMP: I/O-Efficient Big Graph Analytics on a Single Commodity MachineIEEE Trans. Big Data, 2020
- ICDCSElan: Towards Generic and Efficient Elastic Training for Deep LearningIn 40th IEEE International Conference on Distributed Computing Systems, ICDCS 2020, Singapore, November 29 - December 1, 2020 , 2020
2019
- arxivOptimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 MinutesCoRR, 2019