Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sarsa lambda implementation #2

Open
Matyyas opened this issue Jun 1, 2020 · 2 comments
Open

Sarsa lambda implementation #2

Matyyas opened this issue Jun 1, 2020 · 2 comments

Comments

@Matyyas
Copy link

Matyyas commented Jun 1, 2020

Hi @hartikainen,

Thank you for the super cool repo 👍

I add one question regarding the Sarsa agent implementation. In the official pseudo-algorihtm of Sarsa lambda (slide 29) the Q value and the Eligibility Traces are updated at each step for every state-action pair of the environment.

If I correctly understood your code, it seams to me that you only update the current step state-action pair.

   `N[idx1] += 1
    E[idx1] += 1

    alpha = 1.0 / N[idx1]
    delta = reward + self.gamma * Q2 - Q1
    Q += alpha * delta * E
    E *= self.gamma * self.lmbd`

Did you make your implementation knowing such a difference?

Thanks a lot @hartikainen

@Matyyas Matyyas changed the title Sarsa lambda Sarsa lambda implementation Jun 1, 2020
@hartikainen
Copy link
Owner

Hey @Matyyas. Glad to hear you've found the repo useful. To be honest, it's been so long since I touched this repository that I can't recall exactly what my thinking there was. But it's likely that I was not aware of such difference, and had I been I probably would've not implemented it differently 🙂 Nice to hear you caught this though! Let me know what the difference is if you end up trying out both ways.

@Matyyas
Copy link
Author

Matyyas commented Jun 3, 2020

Aha 3 years is a bit of time 😅

Actually, you did implement the "official" version too, it was an error of my part 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants