News

RUC team releases DeepAnalyze, advancing the AI for Data Science paradigm

A research team led by Professor Fan Ju from the School of Information and the Key Laboratory of Data Engineering and Knowledge Engineering at Renmin University of China (RUC) has officially released DeepAnalyze, an agentic large language model designed for data science. The project aims to shift from traditional "tool-based analytics" to agent-driven data science. The model weights, codebase, training data and related research are fully open-sourced, with the preprint available on arXiv.

DeepAnalyze1(1).png

As data becomes ubiquitous, enabling AI systems to autonomously handle complex data science tasks has become a major goal in intelligent system development. Traditional tools rely on fixed workflows and are limited to narrow, single-step tasks. DeepAnalyze develops next-generation data-intelligent systems capable of independently executing complex workflows. 

To achieve this, the team introduced a curriculum-style agentic training method that simulates human learning. Through progressive training in real environments, the model evolves from single-function competence to compound capabilities. They also proposed a data-centric trajectory synthesis framework, which automatically constructs more than 500,000 samples of data science reasoning and environment interaction. These trajectories guide the model in learning effective problem-solving strategies. DeepAnalyze can autonomously perform tasks similar to those of a human data scientist, including full-process data analysis — from data preparation and exploration to modeling, visualization and insight extraction — as well as generate in-depth research reports across unstructured, semi-structured and structured data formats such as TXT, JSON, CSV and Excel.

Key links:

Project website: https://ruc-deepanalyze.github.io

Paper link: https://arxiv.org/pdf/2510.16872

GitHub repository: https://github.com/ruc-datalab/DeepAnalyze

Model weights: https://huggingface.co/RUC-DataLab/DeepAnalyze-8B

Dataset: https://huggingface.co/datasets/RUC-DataLab/DataScience-Instruct-500K


Key contributors:

DeepAnalyze2_副本.png

Zhang Shaolei is a faculty member at RUC's School of Information and a core contributor to RUC-DataLab. He earned his PhD from the Institute of Computing Technology, Chinese Academy of Sciences, and works on large language models, multimodal AI and AI for Data Science. He has published more than 30 papers at leading conferences including NeurIPS, ACL and ICLR, and serves as an area chair for ACL ARR.

Profile: https://github.com/zhangshaolei1998

DeepAnalyze3_副本.png

Fan Ju, professor and doctoral supervisor at RUC and team leader of RUC-DataLab, focuses on data governance technologies and intelligent database systems. He has published more than 60 papers in top international journals and conferences in computer science, and has received several major academic recognitions including the ICDE 2025 Best Paper Runner-Up, the ACM SIGMOD Research Highlight Award and the ACM China Rising Award.


Contact Us

International Students Office
Hong Kong, Macao and Taiwan Affairs Office
Tel: 86-10-82509597
E-mail: international@ruc.edu.cn
京公网安备110402430004号 京ICP备05007162号-1
Copyright © Renmin University of China. All rights reserved. Presented by China Daily.