A model-based approximate λ-policy iteration approach to online evasive path planning and the video

来源 :控制理论与应用（英文版） | 被引量 : 0次 | 上传用户：saintjob

【摘要】

：

This paper presents a model-based approximate λ-policy iteration approach using temporal differences for optimizing paths online for a pursuit-evasion problem,

【作者】

：

Greg FODERARO Vikrarn RAJU Silvia FERRARI

【机构】

：

Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC 27708, U.S.A

【出处】

：

控制理论与应用（英文版）

【发表日期】

：

2011年3期

【关键词】

：

Approximate dynamic programming Reinforcement learning Path planning Pursuit eva

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

This paper presents a model-based approximate λ-policy iteration approach using temporal differences for optimizing paths online for a pursuit-evasion problem,where an agent must visit several target positions within a region of interest while simultaneously avoiding one or more actively pursuing adversaries.This method is relevant to applications,such as robotic path planning,mobile-sensor applications,and path exposure.The methodology described utilizes cell decomposition to construct a decision tree and implements a temporal difference-based approximate λ-policy iteration to combine online leing with prior knowledge through modeling to achieve the objectives of minimizing the risk of being caught by an adversary and maximizing a reward associated with visiting target locations.Online leing and frequent decision tree updates allow the algorithm to quickly adapt to unexpected movements by the adversaries or dynamic environments.The approach is illustrated through a modified version of the video game Ms.Pac-Man,which is shown to be a benchmark example of the pursuit-evasion problem.The results show that the approach presented in this paper outperforms several other methods as well as most human players.

其他文献

Approximate dynamic programming solutions with a single network adaptive critic for a class of nonli

Approximate dynamic programming (ADP) formulation implemented with an adaptive critic (AC)-based neural network (NN) structure has evolved as a powerful techniq

期刊

Approximate dynamic programmingOptimal controlNonlinear controlAdaptive criti

重庆市南川区举办中蜂标准化养殖技术培训班/“喀纳斯蜂蜜”年产1000吨盛名享誉疆内外/科威特发现巨型蜂巢内藏6万只蜂50千克蜜

该文从挂篮荷载计算、施工流程、支座及临时固结施工、挂篮安装及试验、合拢段施工、模板制作安装、钢筋安装、混凝土的浇筑及养生、测量监控等方面人手,介绍了S226海滨大桥

期刊

重庆市川区中蜂标准化养殖技术培训班喀纳斯蜂蜜科威特蜂巢

Finite horizon optimal control of discrete-time nonlinear systems with unfixed initial state using a

In this paper,we aim to solve the finite horizon optimal control problem for a class of discrete-time nonlinear systems with unfixed initial state using adaptiv

期刊

Adaptive dynamic programmingUnfixed initial stateOptimal controlFinite timeN

论行政许可的性质——基于行政许可类型化的分析

行政许可是现代行政权的一种表现方式,是行政机关对行政相对人的活动进行事前控制的重要手段,广泛运用于经济和社会事务的各个领域。在对行政许可法律制度进行理论研究时,对

学位

行政许可类型化法律制度控制性许可行政特许

一块石头的自述

看!溪边静静地躺着一块色彩斑斓的石头.我轻轻地拾起它,仔细端详.这石头儿竟然开口说话了!于是,一幅奇特的画卷在我的眼前慢慢地舒展开来.

期刊

石头画卷彩斑

探究国际贸易理论课程教学方法的优化途径——以浙江树人大学为例

国际贸易指的是全球各个国家和地区之间进行交换商品和交换服务的活动,我国高校中国际贸易理论课程具有以下特点:大量的条例、大量的相关贸易术语、大量的基本概念,要求学生

期刊

国际贸易理论教学方法优化途径

收费习惯养成:电视的种子缘何在互联网上结果?

2016年,中国在线视频行业付费业务发展迅猛,全网付费视频用户规模突破7500万,增速为241%,成为北美、欧洲之后全球第三大视频付费市场;用户付费收入占总收入的比例达到19.3%,

期刊

付费频道视频用户在线视频习惯养成收视习惯频道运营广告营收产业属性娱乐内容草根阶层

五年一贯制高职人才培养模式探讨——基于发电厂及电力系统专业的研究

针对发电厂及电力系统专业五年一贯制学生的生源特点,探讨了建立中、高职一体化的五年一贯制教育教学人才培养模式,人才培养目标定位、基于“四大平台”的课程体系构建、以及

期刊

五年一贯制发电厂及电力系统四大平台全程式

小学音乐教学中节奏训练的重要性

本文通过对荣华二采区10

期刊

我国探获页岩气三级储量近5000亿立方米

从国土资源部2014年9月17日召开的页岩气勘查开发成果新闻发布会上获悉，我国页岩气勘查开发稳步推进，共获得页岩气三级储量近5000亿立方米，形成了年产15亿立方米产能，勘查开发技

期刊

页岩气储量勘查开发立方米新闻发布会国土资源部开发技术开发初期国产化规模化装备四川气田盆地成果产能标志

A model-based approximate λ-policy iteration approach to online evasive path planning and the video

与本文相关的学术论文