Deep Recurrent Q-Learning for Partially Observable Environments using Advantage Prioritized Experience Replay

Abstract

This project presents and end-to-end method to train reinforcement learning agents using deep recurrent networks, while prioritizing the replay memory experiences with importance sampling. The measure to prioritize the experience is an energy spectral density combination of the Advantage function and the TD-error. Experiments with the partially observable Atari game Frostbite show that the method outperforms the DRQN without prioritized experience replay of [2] and converges to the optimal policy faster than the state of the art for the single-agent case in [7].

Deep Recurrent Q-Learning for Partially Observable Environments using Advantage Prioritized Experience Replay

Abstract

Categories

Connect with me through the social media below