Sarsa lambda implementation #2

Matyyas · 2020-06-01T20:44:04Z

Thank you for the super cool repo 👍

I add one question regarding the Sarsa agent implementation. In the official pseudo-algorihtm of Sarsa lambda (slide 29) the Q value and the Eligibility Traces are updated at each step for every state-action pair of the environment.

If I correctly understood your code, it seams to me that you only update the current step state-action pair.

   `N[idx1] += 1
    E[idx1] += 1

    alpha = 1.0 / N[idx1]
    delta = reward + self.gamma * Q2 - Q1
    Q += alpha * delta * E
    E *= self.gamma * self.lmbd`

Did you make your implementation knowing such a difference?

Thanks a lot @hartikainen

The text was updated successfully, but these errors were encountered:

hartikainen · 2020-06-01T21:21:27Z

Hey @Matyyas. Glad to hear you've found the repo useful. To be honest, it's been so long since I touched this repository that I can't recall exactly what my thinking there was. But it's likely that I was not aware of such difference, and had I been I probably would've not implemented it differently 🙂 Nice to hear you caught this though! Let me know what the difference is if you end up trying out both ways.

Matyyas · 2020-06-03T17:42:28Z

Aha 3 years is a bit of time 😅

Actually, you did implement the "official" version too, it was an error of my part 🙏

Matyyas changed the title ~~Sarsa lambda~~ Sarsa lambda implementation Jun 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sarsa lambda implementation #2

Sarsa lambda implementation #2

Matyyas commented Jun 1, 2020 •

edited

Loading

hartikainen commented Jun 1, 2020

Matyyas commented Jun 3, 2020

Sarsa lambda implementation #2

Sarsa lambda implementation #2

Comments

Matyyas commented Jun 1, 2020 • edited Loading

hartikainen commented Jun 1, 2020

Matyyas commented Jun 3, 2020

Matyyas commented Jun 1, 2020 •

edited

Loading