Actor-Critic Reinforcement Learning for a Lunar Lander
Description
The objective of the project is to learn a policy to smoothly descend a lunar lander to the ground in between two flags with minimal fuel use and without collapsing. To this end, Generalized Advantage Estimation (GAE) is used to estimate actor and critic, respectively, while decreasing the variance of the policy gradient estimates and keeping them unbiased.