Exploration Strategies in Deep Reinforcement Learning (Jun 7, 2020)
作者:Lilian Weng
Classic Exploration Strategies
传统探索策略
- 适用于MAB或表格型RL
-greedy - UCB
- Boltzmann(softmax) exploration
- Thompson sampling
- 适用于DRL
- Entropy loss term
- Noise-based Exploration 在各种地方添加噪声
- Noisy Networks for Exploration(2017)
- NoisyNet在NN权重上基于GD引入扰动
- Parameter Space Noise for Exploration(2017)
- Noisy Networks for Exploration(2017)
Key Exploration Problems
The Hard-Exploration Problem
奖励少或者奖励有欺骗性
The Noisy-TV Problem
Agent被TV中虚假的不同状态吸引注意力而不去探索真实状态
Intrinsic Rewards as Exploration Bonuses
利用内在奖励解决Hard-Exploration问题
Count-based Exploration
基于计数评估状态的新颖程度
Counting by Density Model
- Density model(2016)
Counting after Hashing
将高维状态空间用hash(LSH、SimHash)进行映射Prediction-based Exploration
基于预测来评估模型对环境的熟悉程度Forward Dynamics
- IAC(2007)
- ICM(2017)
- VIME(2017)
Random Networks
预测任务与环境无关时选择随机任务来辅助探索 - DORA(2018)
- RND(2018)
Memory-based Exploration
引入外部存储解决上述基于奖励的探索方法存在的问题