Abstract

This project presents and end-to-end method to train reinforcement learning agents using deep recurrent networks, while prioritizing the replay memory experiences with importance sampling. The measure to prioritize the experience is an energy spectral density combination of the Advantage function and the TD-error. Experiments with the partially observable Atari game Frostbite show that the method outperforms the DRQN without prioritized experience replay of [2] and converges to the optimal policy faster than the state of the art for the single-agent case in [7].