强化学习中的探索策略

Posted on 2020-06-29 10:52:07 Edited on 2021-03-21 20:13:40 In 强化学习

Exploration Strategies in Deep Reinforcement Learning (Jun 7, 2020)

作者：Lilian Weng

Classic Exploration Strategies

传统探索策略

适用于MAB或表格型RL
- -greedy
- UCB
- Boltzmann(softmax) exploration
- Thompson sampling
适用于DRL
- Entropy loss term
- Noise-based Exploration 在各种地方添加噪声
  - Noisy Networks for Exploration(2017)
    - NoisyNet在NN权重上基于GD引入扰动
  - Parameter Space Noise for Exploration(2017)

Key Exploration Problems

The Hard-Exploration Problem

奖励少或者奖励有欺骗性

The Noisy-TV Problem

Agent被TV中虚假的不同状态吸引注意力而不去探索真实状态

Intrinsic Rewards as Exploration Bonuses

利用内在奖励解决Hard-Exploration问题

Count-based Exploration

基于计数评估状态的新颖程度

Counting by Density Model

Density model(2016)
Counting after Hashing
将高维状态空间用hash（LSH、SimHash）进行映射
Prediction-based Exploration
基于预测来评估模型对环境的熟悉程度
Forward Dynamics
IAC(2007)
ICM(2017)
VIME(2017)
Random Networks
预测任务与环境无关时选择随机任务来辅助探索
DORA(2018)
RND(2018)
- Random Network Distillation
  Physical Properties
  发现物理环境中隐藏的属性来鼓励探索

Memory-based Exploration

引入外部存储解决上述基于奖励的探索方法存在的问题

Episodic Memory

NGU(2020)
- Never Give Up
- Agent57(2020)
  Direct Exploration
Go-Explore(2019)
DTSIL(2019)
policy-based Go-Explore(2020)
Q-Value Exploration
Bootstrapped DQN(2016)
Varitional Options
VIC(2017)
VALOR(2018)
DIAYN(2018)

Post author: zh2
Post link: http://zanderchang.github.io/2020/06/29/强化学习中的探索策略/
Copyright Notice: All articles in this blog are licensed under BY-NC-SA unless stating additionally.