论文部分内容阅读
In general sum games, taking all agent’s collective rationality into account, we define agents’ global objective, and propose a novel multi agent reinforcement learning(RL) algorithm based on global policy. In each learning step, all agents commit to select the global policy to achieve the global goal. We prove this learning algorithm converges given certain restrictions on stage games of learned Q values, and show that it has quite lower computation time complexity than already developed multi agent learning algorithms for general sum games. An example is analyzed to show the (algorithm’s) merits.
In general sum games, taking all agent’s collective rationality into account, we define agents’ global objective, and propose a novel multi agent reinforcement learning algorithm (RL) algorithm based on global policy. In each learning step, all agents commit to select the global policy to achieve the global goal. We prove this algorithm algorithm converges given certain restrictions on stage games of learned Q values, and show that it has quite lower computation time complexity than already developed multi-agent learning algorithms for general sum games. An example is analyzed to show the (algorithm’s) merits.