专利名称:DATA-EFFICIENT HIERARCHICAL
REINFORCEMENT LEARNING
发明人:Honglak Lee,Shixiang Gu,Sergey Levine申请号:US17050546申请日:20190517
公开号:US20210187733A1公开日:20210624
专利附图:
摘要:Training and/or utilizing a hierarchical reinforcement learning (HRL) model forrobotic control. The HRL model can include at least a higher-level policy model and alower-level policy model. Some implementations relate to technique(s) that enable more
efficient off-policy training to be utilized in training of the higher-level policy modeland/or the lower-level policy model. Some of those implementations utilize off-policycorrection, which re-labels higher-level actions of experience data, generated in the pastutilizing a previously trained version of the HRL model, with modified higher-level actions.The modified higher-level actions are then utilized to off-policy train the higher-levelpolicy model. This can enable effective off-policy training despite the lower-level policymodel being a different version at training time (relative to the version when theexperience data was collected).
申请人:Google LLC
地址:Mountain View CA US
国籍:US
更多信息请下载全文后查看
因篇幅问题不能全部显示,请点此查看更多更全内容