photo

Jiarui Fang (方佳瑞)

fangjiarui123 AT gmail.com

I am currently a principal software engineer at Tencent, the director of a team for software for Large Lauguague/Diffusion Models (LLMs/DiTs) on heterogeneous hardware platforms, including NVIDIA GPU, Intel Gaudi and domestic NPUs. I initialized the open-sourced project xDiT, A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPUs.

During an 20 months period from 2022 to 2023, I gained valuable experience with two startups. I was one of the technical partners at LightYearAI, serving as the first technical member upon the company's inception. I led a team of 20+ members, with the responsibility of data curation for the pre-training of large language models. The company was acquired by Meituan Inc just three months after its establishment. Prior to that, I was the CTO of a LMSys startup company dedicated to open-source AI infrastructures, where I led the project ColossalAI, a training framework for large language models.

Before that, I was a senior engineer at Wechat AI, Tencent. My work was focused on improving the efficiency of online and offline AI applications with innovative parallel computing techniques. I also took part in the development of some basic modules in the WeChat App, including the WeChat Input Method Engine and the WeChat Translation System. At Wechat AI, I initialized two popular open-sourced software, e.g. TurboTransformers, a fast runtime for transformer inference beating FasterTransformers at that time, PatrickStar, an efficient parallel training framework for large language models that implements advanced CPU-GPU offloading techniques. I'm proud that PatrickStar has become a core component of ColossalAI and has contributed to the training of the Tencent Hunyuan LLM. It's also been adopted as a key feature in Pai-Megatron-Patch by Alibaba Cloud. In 2021, I was deeply honored to receive the company's highest individual award in recognition of my contributions to open-source collaboration.

I received a Ph.D. in Computer Science from Tsinghua University in 2019. My advisors are Prof. Guangwen Yang and Prof. Haohuan Fu. My research focused on applying High-Performance Computing (HPC) for scientific applications. The title of my doctoral dissertation is Parallel Deep Learning Training System on Sunway TaihuLight. I served as a visiting scholar under the supervision of Cho-Jui Hsieh at the University of California, Davis, from 2018 to 2019.

I enjoy sports. My hobbies include jogging, football, swimming, and table tennis. You can follow me on 知乎 and Github.