论文部分内容阅读
为提高交通控制系统的适应性和鲁棒性,采用强化学习方法实现交通控制模型的学习能力。对固定周期和变周期两种模式下的单交叉口信号配时优化进行研究,构造了等饱和度优化目标的奖赏函数,建立了等饱和度和延误最小两个优化目标的离线Q学习模型。采用对流量进行离散的方法解决了状态维数爆炸问题。通过算例对建立的4种离线Q学习模型解的结构、最优解的分布进行分析,结果表明相对于在线Q学习模型,离线Q学习模型更适合交叉口信号配时优化。采用“离线学习,在线应用”的方法,将建立的定周期延误最小离线Q学习模型与Webster定周期模型的性能进行对比,总体上前者的车均延误和累积延误低于后者。
In order to improve the adaptability and robustness of traffic control system, the learning ability of traffic control model is enhanced by using reinforcement learning method. The single-intersection signal timing optimization under fixed-cycle and variable-cycle modes is studied. The reward function of equal-saturation optimization target is constructed. An offline Q-learning model is established for two optimization targets with equal saturation and minimum delay. Discrete flow method is used to solve the problem of state dimension explosion. The results show that the offline Q learning model is more suitable for signal timing optimization at intersections than the online Q learning model. By using the method of “offline learning and online application”, the established minimum offline Q learning model with periodic delay is compared with the performance of Webster fixed periodic model. Generally speaking, the average delay and accumulative delay of the former are lower than the latter.