DAI 2024 - AI Agent Day

Regular Session

9:00-12:15, Dec. 19, 2024, UTC+8. Seminar Room: Level 5, AMRA.

Keynote

Building Knowledgeable Agents by Reinforcement Learning from Language Models

9:00-9:45

Yang Yu

About the Presenter

Yang Yu is a Professor at the School of Artificial Intelligence, Nanjing University. His research focuses on artificial intelligence, machine learning, and reinforcement learning. He has published over 100 papers in top-tier journals and conferences. His work has received significant recognition, including the Best Paper Award at the 2024 International Conference on Distributed Artificial Intelligence and four other international paper awards. He has also won three international algorithm competition championships, including the OpenAI Transfer Reinforcement Learning Competition.

Abstract

Large language models (LLMs) have demonstrated remarkable capacity for encoding and reasoning over vast amounts of knowledge, but their potential for decision-making remains limited due to their lack of inherent agency. While many existing approaches attempt to patch LLMs to create knowledgeable agents, an alternative path could be reinforcement learning (RL) centered. In this talk, I will explore this paradigm and present our efforts to integrate LLM knowledge into RL systems, particularly offline RL. By leveraging the knowledge-rich representations of LLMs, we enable RL agents to tackle tasks beyond their original interactive training scope while maintaining the adaptive capabilities of traditional RL systems, such as learning from feedback data. This approach opens up new opportunities for building knowledgeable agents that combine the knowledge power of LLMs with the decision-making flexibility of RL, offering a potential pathway to scalable and continually improving intelligent systems.

Regular Talks

Hammer: Robust Function-Calling for On-Device Language Models via Function Masking 10:15-10:35
Qiqiang Lin, Muning Wen, Qiuying Peng, Guanyu Nie, Junwei Liao, Jun Wang, Xiaoyun Mo,
Jiamu Zhou, Cheng Cheng, Yin Zhao, Jun Wang, Weinan Zhang Qiuying Peng
AgentBoard: An Analytical Evaluation Board of Multi-Turn LLM Agents (NeurIPS’24) 10:35-10:55
Chang Ma, Junlei Zhang, Zhihao Zhu, Cheng Yang, Yujiu Yang, Yaohui Jin,
Zhenzhong Lan, Lingpeng Kong, Junxian He Chang Ma
TWOSOME: An Efficient Online Framework to Align LLMs with Embodied Environments via Reinforcement Learning (ICLR’24) 10:55-11:15
Weihao Tan, Wentao Zhang, Shanqi Liu, Longtao Zheng, Xinrun Wang, Bo An Wentao Zhang
Reinforcing LLM Agents via Policy Optimization with Action Decomposition (NeurIPS’24) 11:15-11:35
Muning Wen, Ziyu Wan, Weinan Zhang, Jun Wang, Ying Wen Muning Wen
Vision-Language-Action Models for Robot Manipulation 11:35-11:55
Yifan Zhong
TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision (SIGIR’24) 11:55-12:15
Ruiwen Zhou, Yingxuan Yang, Muning Wen, Ying Wen, Wenhao Wang, Chunling Xi,
Guoqiang Xu, Yong Yu, Weinan Zhang Yingxuan Yang

LLM Reasoning Forum

13:40-16:40, Dec. 19, 2024, UTC+8. Seminar Room: Level 5, AMRA.

Advancing and Evaluating Agentic Reasoning of LLMs

13:40-14:20

Junxian He

About the Presenter

Junxian He is an assistant professor in the Department of Computer Science and Engineering at the Hong Kong University of Science and Technology. He received his PhD degree from Carnegie Mellon University, Language Technologies Institute. He serves as the area chair of ICLR, ACL, and EMNLP. His recent research focuses on complex reasoning/planning, mechanistic interpretation, and multimodal understanding of large language models.

Abstract

Complex reasoning is one of the most critical abilities for LLM agents, to reason, plan, and make predictions. In this talk, I will cover our recent research on advancing reasoning abilities of LLM agents for various reasoning and planning tasks. First, I will introduce B-STaR, a self-improving algorithm that balances exploration and exploitation to achieve scalable improvements for self-taught reasoners. Next, I will discuss our research on non-myopic generation, which enhances the performance of language models across multiple agentic scenarios at inference time. Lastly, I will talk about our evaluation framework, AgentBoard, to evaluate LLMs as agents in a fine-grained manner.

Agent K: Towards High-Performing Autonomous Data Science Agents

14:20-15:00

Alex Maraval

About the Presenter

Alex Maraval is a Senior ML Engineer at Huawei Noah's Ark Lab in London. He graduated from EPFL (Lausanne, Switzerland) in Mathematics and completed a Master's degree at Imperial College London in Machine Learning. He joined the Decision Making and Reasoning Team in the London Research Centre in 2020 where he started working on Variatonal Inference, Reinforcement Learning with a focus Gaussian Processes and Bayesian Optimization (BO). Alex contributed to a multitude of projects including research on High-Dimensional BO on structured spaces, BO on Graphs, ... He contributed to several publications in top-tier conferences including state-of the art algorithm HEBO and is the first author of Meta-Learning for BO with Transformer Neural Processes, published at NeurIPS 2023. More recently, Alex has been focusing on LLMs related projects. His research directions include building specialized Agents, extending RAG techniques, researching more performant optimizers and improving fine-tuning.

Abstract

Data science has long been essential, driving ongoing efforts to create agents capable of tackling complex tasks autonomously. While many such agents exist, they are often limited in scope, needing end-to-end automation or falling short in performance. In this talk, we'll explore Agent K, a new, fully autonomous agent that achieves both complete end-to-end automation and high-level performance. We created a benchmark based on real Kaggle competitions to evaluate its capabilities. Agent K achieved a 92.5% automation success rate in testing, verified through unit tests, and demonstrated its ability to handle multimodal tasks across diverse domains. Agent K consistently ranked in the top 38% among human competitors in these same competitions. Additionally, Agent K earned two bronze medals in featured competitions, making it an (unofficial) Kaggle expert. Furthermore, results comparable to achieving six gold, three silver, and seven bronze medals across all competition types-an unofficial skill set similar to Grandmaster-level medal performance.

Unlocking RL Potential: The Power of Generative Models in Complex Environments

15:20-16:00

Zhongwen Xu

About the Presenter

Dr. Zhongwen Xu is a Principal Scientist at Tencent, where his research focuses on deep reinforcement learning and generative models. He earned his Ph.D. from the University of Technology Sydney under the supervision of Prof. Yi Yang. He received his Bachelor's degree from Zhejiang University in 2013, where he was advised by Prof. Yueting Zhuang and Prof. Fei Wu. Before joining Tencent, Dr. Xu served as a Principal Scientist at Sea AI Lab, an Adjunct Assistant Professor at the National University of Singapore, and a Senior Research Scientist at DeepMind.

Abstract

This talk explores how generative models, including large language models (LLMs) and video generation models, are transforming reinforcement learning (RL) in complex 3D game environments. We focus on two key advancements: First, we demonstrate how learned world simulator models, derived from environment dynamics, significantly reduce the data needed to train effective RL agents and facilitate generalization on unseen environments. Second, we introduce a novel LLM-driven approach that streamlines the development process, enabling the creation of agents capable of playing thousands of 3D games with unprecedented efficiency.

Improve Multi-step Reasoning for LLMs with Deliberate Planning

16:00-16:40

Chaojie Wang

About the Presenter

Dr. Chaojie Wang is currently working as Research Scientist in Skywork AI 2050, which focuses on accelerating the pace of realization of Artificial General Intelligence (AGI). Before that, Chaojie obtained Ph.d degree from Xidian University in 2021 and worked as Research Fellow in Nanyang Technological University until 2023. Chaojie has published more than 30 papers in top AI conference and journals, such as T-PAMI, NeurIPS, ICML etc, leading a research team of nearly 10 members focusing on generative models, large language models (LLM), and reinforcement learning from human feedback (RLHF) technologies. In 2021, as the first contributor, Chaojie won the championship of the L2RPN-2021 international competition, and is currently focused on the development and implementation of the Skywork’s supper app.

Abstract

Large Language Models (LLMs) trained on vast corpora of text data have demonstrated an impressive capability across various natural language tasks. However, the auto-regressive generation process makes LLMs prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning, especially for solving mathematical problems and code generation. In this talk, we will focus on sharing our practical experiences in enhancing the multi-step reasoning capabilities of Skywork-o1 through cutting-edge alignment techniques, specifically Reinforcement Learning from Human Feedback (RLHF) and tree-search-based planning methods. Additionally, we will also discuss the opportunities and challenges that may be encountered in the future.