基于有模型强化学习的飞行器制导技术研究

滕庆骅; 惠俊鹏; 李天任; 杨奔

引用本文:		[点击复制]
		[点击复制]

【打印本页】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

←前一篇|后一篇→

过刊浏览高级检索

本文已被：浏览 142次下载 187次

基于有模型强化学习的飞行器制导技术研究

滕庆骅,惠俊鹏,李天任,杨奔

字体:加大+|默认|缩小-

(中国运载火箭技术研究院研究发展中心，北京 100076;北京长征航天飞行器研究所，北京 100076)

摘要:

面向避障、绕飞等任务驱动的飞行器在线轨迹，为了提升制导性能，适应快速变化的复杂场景，聚焦于充分利用飞行器模型中的已知信息，基于iLQR这种有模型强化学习方法，设计了智能化的制导方式。与无模型强化学习相比，有模型强化学习的可解释性好，训练难度低。在单飞行器制导仿真中，相比TD3算法，iLQR方法飞行过程平均制导误差增加了28.07%，中末交班点误差降低到12.35%，提升幅度巨大；在多飞行器编队保持问题上，相比TD3算法，iLQR方法跟踪效果提升巨大，平均误差不超过TD3算法的22.67%，最大误差不超过TD3算法的15.44%。

关键词: iLQR算法有模型强化学习标准轨迹制导强化学习制导编队保持

DOI：

基金项目:

Research on Aircraft Guidance Technology Based on Model-Based Reinforcement Learning

TENG Qinghua,HUI Junpeng,LI Tianren,YANG Ben

(Research & Development Center, China Academy of Launch Vehicle Technology, Beijing 100076，China;Beijing Institute of Space Long March Vehicle, Beijing 100076，China)

Abstract:

For online-planned aircraft trajectory by tasks such as obstacle avoidance and detour flights,to improve guidance performance and adapt to rapidly changing complex scenarios, this paper designs an intelligent guidance method based on the iLQR model-based reinforcement learning approach, which could fully utilize the known information in the aircraft model.Compared to model-free reinforcement learning, model-based reinforcement learning has better interpretability and lower training difficulty.In single-aircraft guidance simulations, although the average guidance error during flight is 28.07% worse than the TD3 algorithm, the error at the final hand-over point reduce to 12.35%.In the multi-aircraft formation maintenance problem, the iLQR method shows a substantial improvement in tracking performance, whose average error is 22.67% and maximum error is 15.44% of TD3 algorithm.

Key words: iLQR algorithm Model-based reinforcement learning Standard trajectory guidance Reinforcement learning guidance Formation keeping

用微信扫一扫