Reinforcement Learning, MDPs and Planning
An increasing availability of rich data over recent years hasled to exciting advances in the theory and practice ofreinforcement learning. By learning from interactions with theenvironment, its goal is to find optimal or sufficiently goodactions for a situation in order to maximize a reward.Supposed there is a real time strategy game, where a playercan move in multiple directions, using two basic actions: move andfire. This player, who is controlling a tank, aims to destroy theenemy tanks or base and protect its own base. The player can movethe tank in four directions (up, right, left, down), and can firein the direction where the tank last moved. The bases are static.There are also several obstacles: brick wall, lead wall, and waterbodies. The tanks can fire at any of these obstacles. It can passthrough the water bodies. However, it can only pass through thebrick wall after destroying it; the lead wall cannot be destroyedby firing at it.
a) Propose a learning algorithm to navigate through this gameusing RL with the goal of defending one’s own base and destroyingthe enemy base and tanks. You should describe the basic componentsof RL, define appropriate reward functions and value functions,appropriate parameters, and action selection policy.b) Describe how planning could be integrated to benefit thelearning process. You may derive your thought process from examplesthat have been successful at merging planning and learning.
Expert Answer
Answer to Reinforcement Learning, MDPs and Planning An increasing availability of rich data over recent years has led to exciting…