强化学习中的探索策略

Exploration Strategies in Deep Reinforcement Learning (Jun 7, 2020)

作者:Lilian Weng

Classic Exploration Strategies

传统探索策略

Key Exploration Problems

The Hard-Exploration Problem

奖励少或者奖励有欺骗性

The Noisy-TV Problem

Agent被TV中虚假的不同状态吸引注意力而不去探索真实状态

Intrinsic Rewards as Exploration Bonuses

利用内在奖励解决Hard-Exploration问题

Count-based Exploration

基于计数评估状态的新颖程度

Counting by Density Model

  • Density model(2016)

    Counting after Hashing

    将高维状态空间用hash(LSH、SimHash)进行映射

    Prediction-based Exploration

    基于预测来评估模型对环境的熟悉程度

    Forward Dynamics

  • IAC(2007)
  • ICM(2017)
  • VIME(2017)

    Random Networks

    预测任务与环境无关时选择随机任务来辅助探索
  • DORA(2018)
  • RND(2018)
    • Random Network Distillation

      Physical Properties

      发现物理环境中隐藏的属性来鼓励探索

Memory-based Exploration

引入外部存储解决上述基于奖励的探索方法存在的问题

Episodic Memory

  • NGU(2020)
    • Never Give Up
    • Agent57(2020)

      Direct Exploration

  • Go-Explore(2019)
  • DTSIL(2019)
  • policy-based Go-Explore(2020)

    Q-Value Exploration

  • Bootstrapped DQN(2016)

    Varitional Options

  • VIC(2017)
  • VALOR(2018)
  • DIAYN(2018)

原文地址