Dacheng Li

I am a CS PhD at EECS, UC Berkeley with Prof. Ion Stoica and Prof. Joseph Gonzalez , affliated with Sky and BAIR . I obtained my master in Machine Learning at CMU with Prof. Eric Xing and Prof. Hao Zhang . I obtained my undergraduate with double majors in Computer Science and Mathematics at UC San Diego with Prof. Zhuowen Tu .

I study Machine Learning, in the context of modeling performance, scaling, system efficiency, framework usability, and theoratical support. My goal is to develop, support performant models at scale, and provide easily usable framework for people, to faciliate ML deployment in the real world. I usually work on topics around LLMs (long-context, RAG, LoRA, fairness, open-source .. etc), privacy-preserving serving, large-scale ML algorithms.

Also check out my girlfriend's webpage . She is a great CS PhD at UW.

Google Scholar  /  GitHub  /  Resume  /  SoP  /  Twitter

  • 2023-08 Joined Google as a student researcher, working on LLMs evaluation with Zizhao Zhang.
  • 2023-06 Released LongChat, a series of long-context models and evaluation toolkits,
  • 2023-06 Our official paper of Vicuna "Judging LLM-as-a-judge with MT-Bench and Chatbot Arena" is publicly available.
  • 2023-04 Released FastChat-T5.
  • 2023-01 Our paper "MPCFormer: fast, performant and private Transformer inference with MPC is accepted at ICLR 2023 as a spotlight.
  • 2022-12 Our proposal "A Faster and More Accurate Secure Model Serving Framework on the Cloud" is accepted at Amazon Research Awards.
  • 2022-10 Our paper "AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness" is accepeted at NeruIPS 2022.
  • 2021-03 Our paper "Dual Contradistinctive Generative AutoEncoder" is accepted at CVPR 2021.
  • Does language modeling lead to intelligence? If so, why? Both being the ability to predict next state, what is the connection between LM and world model? More recently, SORA seems to be a pure diffusion model (without autoregression) that is internally a world-model, why?
  • What is the goal of robustness of AI systems, or computer systems in general? I have found that we practically need a statistical guarantee (e.g. 99.99%) instead of a formally provable guarantee (100%)?
  • Awards