论文部分内容阅读
针对将单AgentQ-学习协作算法直接扩展到多Agent系统会导致状态-动作对集合的急剧膨胀、从而影响多Agent的协作学习速度的问题,提出了基于实用推理的多Agent协作强化学习算法.在实用推理框架下,首先在慎思过程中通过考虑群体意图来确定单个Agent的子意图;然后,在手段-目的推理过程中采用Q-学习算法得出实现子意图的最优策略,从而实现群体意图.在Q-学习算法中,各Agent只需考虑自身的状态-动作的值函数更新,对其他Agent值函数的更新可以不加考虑,从而大大降低了算法的空间复杂度,提高了学习速度.追捕问题的仿真实验结果验证了算法的有效性.
Aiming at the problem that the single AgentQ-learning collaboration algorithm directly extends to multi-agent system, which leads to the rapid expansion of the state-action pair to the collection and thus to the speed of collaborative learning of multi-agent, a multi-agent cooperative reinforcement learning algorithm based on practical inference is proposed. In the framework of practical reasoning, the sub-intent of a single agent is first determined by considering the group intent in the process of deliberation. Then, the Q-learning algorithm is used to obtain the optimal strategy of sub-intension in the process-purpose reasoning so as to achieve the group intention In the Q-learning algorithm, each agent only needs to consider its own state-value updating function of the action, and can not consider the updating of other Agent value functions, thus greatly reducing the space complexity of the algorithm and increasing the learning speed. Simulation results of the hunt test verify the effectiveness of the algorithm.