Skip to content

Latest commit

 

History

History
102 lines (74 loc) · 3.25 KB

README.md

File metadata and controls

102 lines (74 loc) · 3.25 KB

Frozen Lake Project

Overview

The agent wants to cross the frozen lake from Start (S) to Goal (G) without falling into any Holes (H) by walking over the Frozen (F) lake. Since the lake is frozen, the agent may not always move to the grid that it intends[1].

Approaches

We identify this problem as a stochastic problem and we decide to use two approaches to solve this task: Policy Iteration (PI) and Value Iteration (VI).

  • Policy Iteration (PI): Firstly, we randomly initialize a value function. Secondly, we iterate the value function over and over until the it converges (policy evaluation). Thridly, we use that value function to get our policy (policy improvement).

  • Value Iteration (VI): Firstly, we randomly initialize a value function. Secondly, we find the optimal value function. Thirdly, we extract a policy based on that value function.

Our code is based on the following pseudocode[2].

my alt text

Policy iteration

 

my alt text

Value iteration

 

Results

  • Policy Iteration (PI):

my alt text

Policy of Policy Iteration

 

my alt text

Value function of Policy Iteration

 

  • Value Iteration (VI):

my alt text

Policy of Value Iteration

 

my alt text

Value function of Value Iteration

 

 

We run 50 trials, each trial we calculate the value function and the policy, and we run the agent using that information for 100 episodes and sum up the number of times it reaches the goal without falling into one of the holes in the map.

Results
Policy Iteration Value Iteration
mean 78.46 78.94
std 3.45354 3.683443
min 70 67
max 86 88

References

  1. Frozen Lake

  2. Reinforcement Learning: An Introduction