论文部分内容阅读
将分层强化学习算法中的子任务应用于同类学习任务中是当前强化学习的一个研究热点。在控制系统中,分层强化学习算法存在着子任务受系统参数影响而难以重用的问题。针对这一问题,文章提出基于定性动作的分层Option算法。算法用定性动作描述在参数值不同的系统中,系统同一状态的最优动作所具有的共同特征。同时建立分层子任务,用低层子任务屏蔽系统参数对高层子任务的影响,文中提出的算法用于倒立摆的控制中,算法利用学好的高层子任务仅需要进行少量的学习即可成功控制各种参数值不同的倒立摆系统。
Applying subtasks in hierarchical reinforcement learning algorithms to similar learning tasks is a hot research topic in current intensive learning. In the control system, the hierarchical enhancement learning algorithm has the problem that sub-tasks are hard to reuse due to the influence of system parameters. In response to this problem, the article proposes a hierarchical Option algorithm based on qualitative action. The algorithm uses qualitative action to describe the common features possessed by the optimal states of the same state in systems with different parameter values. At the same time, a hierarchical subtask is established, and the effect of system parameters on the top subtasks is masked with lower subtasks. The proposed algorithm is applied to the control of inverted pendulum. The algorithm can make use of the good high level subtasks that only a small amount of learning can be successfully controlled Inverted pendulum system with different parameter values.