Frozen Lake Project

Overview

The agent wants to cross the frozen lake from Start (S) to Goal (G) without falling into any Holes (H) by walking over the Frozen (F) lake. Since the lake is frozen, the agent may not always move to the grid that it intends[1].

Approaches

We identify this problem as a stochastic problem and we decide to use two approaches to solve this task: Policy Iteration (PI) and Value Iteration (VI).

Policy Iteration (PI): Firstly, we randomly initialize a value function. Secondly, we iterate the value function over and over until the it converges (policy evaluation). Thridly, we use that value function to get our policy (policy improvement).
Value Iteration (VI): Firstly, we randomly initialize a value function. Secondly, we find the optimal value function. Thirdly, we extract a policy based on that value function.

Our code is based on the following pseudocode[2].

Policy iteration

Value iteration

Results

Policy Iteration (PI):

Policy of Policy Iteration

Value function of Policy Iteration

Value Iteration (VI):

Policy of Value Iteration

Value function of Value Iteration

We run 50 trials, each trial we calculate the value function and the policy, and we run the agent using that information for 100 episodes and sum up the number of times it reaches the goal without falling into one of the holes in the map.

Results

	Policy Iteration	Value Iteration
mean	78.46	78.94
std	3.45354	3.683443
min	70	67
max	86	88

References

Frozen Lake
Reinforcement Learning: An Introduction

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Frozen Lake Project

Overview

Approaches

Results

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Frozen Lake Project

Overview

Approaches

Results

References