Kernel and Moment Based Prediction and Planning
In this talk I will introduce a combination of moment based predictive models with deep reinforcement learning architectures, Recurrent Predictive State Policy (RPSP) networks. Predictive state serves as an equivalent representation of a belief state. Therefore, the policy component of the RPSP-network can be purely reactive, simplifying training while still allowing optimal behaviour. We show the efficacy of RPSP-networks under partial observability on a set of robotic control tasks from OpenAI Gym. We empirically show that RPSP-networks perform well compared with memory-preserving networks such as GRUs, as well as finite memory models. This work was done in collaboration with Ahmed Hefny at CMU.