A research team led by Professor Fan Ju from the School of Information and the Key Laboratory of Data Engineering and Knowledge Engineering at Renmin University of China (RUC) has officially released DeepAnalyze, an agentic large language model designed for data science. The project aims to shift from traditional "tool-based analytics" to agent-driven data science. The model weights, codebase, training data and related research are fully open-sourced, with the preprint available on arXiv.

As data becomes ubiquitous, enabling AI systems to autonomously handle complex data science tasks has become a major goal in intelligent system development. Traditional tools rely on fixed workflows and are limited to narrow, single-step tasks. DeepAnalyze develops next-generation data-intelligent systems capable of independently executing complex workflows.
To achieve this, the team introduced a curriculum-style agentic training method that simulates human learning. Through progressive training in real environments, the model evolves from single-function competence to compound capabilities. They also proposed a data-centric trajectory synthesis framework, which automatically constructs more than 500,000 samples of data science reasoning and environment interaction. These trajectories guide the model in learning effective problem-solving strategies. DeepAnalyze can autonomously perform tasks similar to those of a human data scientist, including full-process data analysis — from data preparation and exploration to modeling, visualization and insight extraction — as well as generate in-depth research reports across unstructured, semi-structured and structured data formats such as TXT, JSON, CSV and Excel.
Key links:
Project website: https://ruc-deepanalyze.github.io
Paper link: https://arxiv.org/pdf/2510.16872
GitHub repository: https://github.com/ruc-datalab/DeepAnalyze
Model weights: https://huggingface.co/RUC-DataLab/DeepAnalyze-8B
Dataset: https://huggingface.co/datasets/RUC-DataLab/DataScience-Instruct-500K
Key contributors:

Zhang Shaolei is a faculty member at RUC's School of Information and a core contributor to RUC-DataLab. He earned his PhD from the Institute of Computing Technology, Chinese Academy of Sciences, and works on large language models, multimodal AI and AI for Data Science. He has published more than 30 papers at leading conferences including NeurIPS, ACL and ICLR, and serves as an area chair for ACL ARR.
Profile: https://github.com/zhangshaolei1998

Fan Ju, professor and doctoral supervisor at RUC and team leader of RUC-DataLab, focuses on data governance technologies and intelligent database systems. He has published more than 60 papers in top international journals and conferences in computer science, and has received several major academic recognitions including the ICDE 2025 Best Paper Runner-Up, the ACM SIGMOD Research Highlight Award and the ACM China Rising Award.