论文部分内容阅读
This paper investigated how to learn the optimal action policies in cooperative multiagent systems if the agents' rewards are random variables, and proposed a general two-stage learning algorithm for cooperative multiagent decision processes. The algo