论文部分内容阅读
针对空间认知导向下模型驱动型路径规划和人们认知偏好多样性之间的矛盾,提出了一种基于分层强化学习的交互学习型路径规划方法。该方法将最优路径标准转换为路口处转向决策的瞬时奖励值,并通过预学习和实时学习两个阶段实现高效地发现总奖励值最大的最优路径策略。其中,预学习阶段自动发现子目标节点,并构建包含局部最优策略的子任务;实时学习阶段利用预定义策略实现高效的Q值更新,并根据Q值追溯最优路径。实验表明,该方法具有足够好的实时性和最优性。
In view of the contradiction between model-driven path planning and people’s cognitive preference diversity under spatial cognition, an interactive learning path planning method based on hierarchical reinforcement learning is proposed. This method transforms the optimal path standard into the instantaneous reward value of the steering decision at the intersection. Through the two stages of pre-learning and real-time learning, the optimal path strategy with the largest total reward value is found efficiently. Among them, sub-target nodes are automatically found in pre-learning phase, and sub-tasks including local optimal strategies are constructed. In real-time learning phase, efficient Q value updating is achieved by using predefined strategies and the optimal path is traced based on Q values. Experiments show that this method has good enough real-time and optimality.