论文部分内容阅读
用更为紧凑的方法表示和存贮值函数 ,以求解大规模平均模型 Markov决策规划(MDP)问题 .通过状态集结相对值迭代算法逼近值函数 ,用 Span半范数和压缩映射原理分析算法的收敛性 .给出了状态集结后的 Bellman最优方程 .在 Span压缩条件下证明了该算法的收敛性 ,同时还给出了其误差估计
In a more compact way, the value function is expressed and stored to solve the large-scale averaging model Markov decision-making problem (MDP). By state-aggregated relative value iteration algorithm approximation value function, Span semi-norm and compression mapping principle analysis algorithm Convergence. Bellman’s optimal equation after state aggregation is given. The convergence of the algorithm is proved under Span compression and its error estimate