引用本文:[点击复制]
[点击复制]
【打印本页】 【下载PDF全文】 查看/发表评论下载PDF阅读器关闭

←前一篇|后一篇→

过刊浏览    高级检索

本文已被:浏览 17次   下载 32
基于有模型强化学习的飞行器制导技术研究
滕庆骅,惠俊鹏,李天任,杨奔
0
(中国运载火箭技术研究院研究发展中心,北京 100076;北京长征航天飞行器研究所,北京 100076)
摘要:
面向避障、绕飞等任务驱动的飞行器在线轨迹,为了提升制导性能,适应快速变化的复杂场景,聚焦于充分利用飞行器模型中的已知信息,基于iLQR这种有模型强化学习方法,设计了智能化的制导方式。与无模型强化学习相比,有模型强化学习的可解释性好,训练难度低。在单飞行器制导仿真中,相比TD3算法,iLQR方法飞行过程平均制导误差增加了28.07%,中末交班点误差降低到12.35%,提升幅度巨大;在多飞行器编队保持问题上,相比TD3算法,iLQR方法跟踪效果提升巨大,平均误差不超过TD3算法的22.67%,最大误差不超过TD3算法的15.44%。
关键词:  iLQR算法  有模型强化学习  标准轨迹制导  强化学习制导  编队保持
DOI:
基金项目:
Research on Aircraft Guidance Technology Based on Model-Based Reinforcement Learning
TENG Qinghua,HUI Junpeng,LI Tianren,YANG Ben
(Research & Development Center, China Academy of Launch Vehicle Technology, Beijing 100076,China;Beijing Institute of Space Long March Vehicle, Beijing 100076,China)
Abstract:
For online-planned aircraft trajectory by tasks such as obstacle avoidance and detour flights,to improve guidance performance and adapt to rapidly changing complex scenarios, this paper designs an intelligent guidance method based on the iLQR model-based reinforcement learning approach, which could fully utilize the known information in the aircraft model.Compared to model-free reinforcement learning, model-based reinforcement learning has better interpretability and lower training difficulty.In single-aircraft guidance simulations, although the average guidance error during flight is 28.07% worse than the TD3 algorithm, the error at the final hand-over point reduce to 12.35%.In the multi-aircraft formation maintenance problem, the iLQR method shows a substantial improvement in tracking performance, whose average error is 22.67% and maximum error is 15.44% of TD3 algorithm.
Key words:  iLQR algorithm  Model-based reinforcement learning  Standard trajectory guidance  Reinforcement learning guidance  Formation keeping

用微信扫一扫

用微信扫一扫