论文部分内容阅读
针对分层策略梯度强化学习算法(HPGRL)易陷入局部最优点等问题,提出一种分层策略搜索算法(PSO-HPS).首先由设计者按照经典分层强化学习MAXQ方法的思想构建子任务分层结构,通过与环境的直接交互,PSO-HPS利用具有较强全局搜索能力的粒子群对各复合子任务中的参数化策略进行进化,以获得优化的动作策略.最后以协商僵局消解的实验验证PSO-HPS是有效的,其性能明显优于HPGRL.
A hierarchical strategy search algorithm (PSO-HPS) is proposed to solve the problem that gradient-enhanced learning algorithm (HPGRL) is easy to fall into the local optimum. Firstly, the designer constructs subtasks according to the idea of classic hierarchical enhancement learning MAXQ Hierarchical structure, through direct interaction with the environment, PSO-HPS uses the particle swarm with strong global search ability to evolve the parameterized strategies in each sub-task to obtain the optimized action strategy.At last, Experimental verification PSO-HPS is valid, its performance is obviously better than HPGRL.