Pengcheng Hu
-
BEng (University of Victoria, 2021)
Topic
Integration of Model Predictive Control and Reinforcement Learning for Dynamic Systems with Application to Robot Manipulators
Department of Mechanical Engineering
Date & location
-
Monday, November 25, 2024
-
10:00 A.M.
-
Engineering Office Wing
-
Room 430
Reviewers
Supervisory Committee
-
Dr. Yang Shi, Department of Mechanical Engineering, University of Victoria (Supervisor)
-
Dr. Daniela Constantinescu, Department of Mechanical Engineering, UVic (Member)
External Examiner
-
Dr. Kui Wu, Department of Computer Science, University of Victoria
Chair of Oral Examination
- Dr. Elisabeth Gugl, Department of Economics, UVic
Abstract
The last decade has witnessed great progress in the development of reinforcement learning (RL) across many applications, such as games and autonomous driving. RL is effective in solving control problems for complex systems whose dynamics are intractable to be accurately modeled. In an RL algorithm, the agent learns the optimal policy based on measurement samples from the interactions with the environment. To obtain the optimal policy, RL requires collecting sufficient large number of samples, which is challenging in real world applications, e.g., robotics, surgery, and so on. To tackle this problem, model predictive control-based RL (MPC-based RL) is proposed to improve the sample efficiency. In the MPC-based RL algorithm, a model is learned from collected samples, the learned model and MPC are utilized to predict trajectories over a specified prediction horizon, and an action is obtained through the RL algorithm by maximizing the cumulative reward. This thesis is devoted to the investigation of the MPC-based RL design and its application to robot manipulators.
In Chapter 2, an MPC-based deep RL framework for constrained linear systems with bounded disturbances is proposed. In the proposed framework, a rigid tube based MPC (RTMPC) method is employed to predict a trajectory by solving the corresponding optimization problem. Then, the predicted trajectory is stored in a replay buffer as the form of data pairs. Further, the soft actor-critic (SAC) algorithm is applied to modify the loss function and update the policy online, based on the predicted data pairs. By utilizing proposed framework, the control objectives are achieved without requiring a large number of real samples. Moreover, the proposed framework shows comparable computational complexity to RTMPC. Finally, numerical simulations and comparisons are provided to demonstrate that proposed framework leads to a better control performance.
In Chapter 3, we investigate the application of three methods for manipulators. Firstly, we apply a MPC-based RL algorithm, a nonlinear MPC (NMPC) method, and two model-free RL algorithms to tackle the regulation problem for a 2-degree of-freedom manipulator system, and compare their control and training performance. Secondly, the training and control performance evaluation for the model-free RL algorithm and the MPC-based RL algorithm are provided. The MPC-based RL algorithm shows a better training performance in terms of sample efficiency and total return but a poorer control performance. Thirdly, simulation studies are provided to compare the training performance of the MPC-based RL algorithm and two model-free RL algorithms. From the simulation results, the MPC-based RL algorithm presents a poorer training performance compared with model-free RL algorithms for the twelve-dimensional system. In Chapter 4, conclusions and future works are summarized.