Gaoling School of Artificial Intelligence and Ant Group co-launch LLaDA-MoE model series

ruc.edu.cn | September 26, 2025

The research team from the Gaoling School of Artificial Intelligence at Renmin University of China (RUC), in collaboration with Ant Group, officially unveiled the LLaDA-MoE series of models. The new models creatively integrate masked diffusion mechanisms with dynamic sparse activation strategies, marking a significant milestone in the evolution of diffusion-based language models, from dense computation to sparse and efficient architectures.

As a core driving force of artificial intelligence, large language models (LLMs) are profoundly reshaping applications in natural language understanding, intelligent decision-making systems and everyday human-computer interaction. However, traditional autoregressive models, which generate text sequentially token by token, face inherent limitations: slow decoding, difficulty modeling bidirectional dependencies and inefficiency with long sequences.

The research team, led by Wen Jirong and Li Chongxuan, professors at RUC’s Gaoling School of Artificial Intelligence, made a major breakthrough by challenging the long-held assumption that "large language models must be autoregressive". Based on theoretical insights and scaling law exploration, the team proposed that the essence of linguistic intelligence is not bound to autoregressive structures but lies in accurately modeling natural language distributions.

Building on this philosophy, the team and Ant Group jointly released LLaDA 8B in February, the world’s first dialogue-capable diffusion large language model. Since its release, the LLaDA series has attracted widespread attention across academia, industry and the open-source community, witnessing over 400,000 downloads in a single month. Research institutions and companies such as NVIDIA, Apple, UCLA, and ByteDance have cited LLaDA in follow-up work or used it as a benchmark model.

gaoling_副本.png

The LLaDA-MoE instruction-tuned model is compared with other diffusion and autoregressive language models across key metrics on various tasks. [Photo/ruc.edu.cn]

Over the following months, RUC and Ant Group continued their collaboration, introducing successive models. The joint team has now taken a further step by extending the diffusion language model architecture from dense to Mixture-of-Experts (MoE), releasing the first native diffusion language model with an MoE architecture — LLaDA-MoE. This marks a dual breakthrough in efficiency and intelligence, ushering in a new era of sparse, parallel and high-performance language modeling.

By achieving deep integration between diffusion language modeling and sparse MoE structures, LLaDA-MoE overcomes long-standing bottlenecks in expressiveness and scalability for non-autoregressive models. Through the combination of mask-based denoising and expert routing mechanisms, it delivers high-quality text generation with minimal computational cost. This innovation establishes a new paradigm of "efficient intelligence" for diffusion-based architectures, paving the way for future scaling and exploration of AI’s upper limits.

Links for the models:

https://huggingface.co/inclusionAI/LLaDA-MoE-7B-A1B-Base

https://huggingface.co/inclusionAI/LLaDA-MoE-7B-A1B-Instruct

RENMIN UNIVERSITY of CHINA

General Information

History

Leadership

Schools and Departments

RUC at a Glance

Flagship Cultural Brands

Highlights

Updates

Undergraduate

Graduate

International Students

Joint-degree Programs

Non-degree Programs

Scholarships

Humanities and Social Sciences

Sciences and Engineering

Partnerships

Confucius Institutes

Research Centers

Overview

Careers

Visas

Venues

Events

Gallery

Dining and Accommodation

Sciences and Engineering

Gaoling School of Artificial Intelligence and Ant Group co-launch LLaDA-MoE model series

ruc.edu.cn | September 26, 2025

Quick Links

About RUC

Venues

Admissions

Careers

Contact Us

International Students Office
Hong Kong, Macao and Taiwan Affairs Office

Tel: 86-10-82509597
E-mail: international@ruc.edu.cn

京公网安备110402430004号 京ICP备05007162号-1

Copyright © Renmin University of China. All rights reserved. Presented by China Daily.

RENMIN UNIVERSITY of CHINA

Gaoling School of Artificial Intelligence and Ant Group co-launch LLaDA-MoE model series

ruc.edu.cn | September 26, 2025

Quick Links

Contact Us

Tel: 86-10-82509597 E-mail: international@ruc.edu.cn

Copyright © var oTime = new Date(); document.write(oTime.getFullYear()); Renmin University of China. All rights reserved. Presented by China Daily.

Tel: 86-10-82509597
E-mail: international@ruc.edu.cn

Copyright © Renmin University of China. All rights reserved. Presented by China Daily.