论文部分内容阅读
首先对传统的绿灯时间等饱和度概念进行了扩展,提出了分级绿灯时间等饱和度。在此基础上,针对分级绿灯时间等饱和度目标,构造了奖赏函数,建立了定周期和变周期两种模式下的四种离线Q学习配时优化模型。相对于在线Q学习模型,离线Q学习模型更适合交叉口信号配时优化,变周期模式的离线Q学习模型可以获得解的结构、最优解的分布,这是传统配时理论不具备的。算例结果表明,定周期模式下最优解是唯一的。变周期模式下最优解是不唯一的,呈带状,奖赏分级模型比奖赏不分级的最优解更加集中。
First of all, the traditional concept of green light time saturation has been extended, such as grading green light time saturation. On the basis of this, aiming at the goal of grading green light, such as saturation, a reward function is constructed, and four offline Q learning models for time distribution optimization are established under the two modes of fixed cycle and variable cycle. Compared with the online Q learning model, the offline Q learning model is more suitable for the signal timing optimization of the intersection, and the offline Q learning model can obtain the structure of the solution and the distribution of the optimal solution in the variable cycle mode, which is not available in the traditional timing theory. The results show that the optimal solution is the only one in periodic mode. The optimal solution in variable cycle mode is not unique. It is ribbon-shaped. The reward classification model is more concentrated than the optimal solution without reward.