Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-trivial games #209

Open
StepHaze opened this issue Nov 2, 2023 · 6 comments
Open

Non-trivial games #209

StepHaze opened this issue Nov 2, 2023 · 6 comments

Comments

@StepHaze
Copy link

StepHaze commented Nov 2, 2023

Is it possible to create a great player using AlphaZero.jl for non-trivial games like chess, go, shogi?
Or it's only good for simple games like connect4 and mancala?

@smart-fr
Copy link

smart-fr commented Nov 4, 2023

Yes, it is possible. Have a look at the game ReKtaNG on https://rektang.com/.
The solo mode is powered by various AlphaZero.jl agents with different levels of training, for an adaptive game difficulty.

@StepHaze
Copy link
Author

StepHaze commented Nov 7, 2023

Yes, it is possible. Have a look at the game ReKtaNG on https://rektang.com/. The solo mode is powered by various AlphaZero.jl agents with different levels of training, for an adaptive game difficulty.

is it really a complex game like like chess, go, shogi? With an unlimited number of possible positions?

@jonathan-laurent
Copy link
Owner

jonathan-laurent commented Nov 7, 2023

The more relevant number to measure game-complexity when it comes to AlphaZero is the size of the action space and not the total number of positions. Also, strictly speaking, Chess and Go do have a finite (although very large) state space.

When learning from scratch, AlphaZero's training time is going to depend strongly on the size of the action space. This is to be expected since AlphaZero discovers new moves by randomly exploring actions. Therefore, if all you have is a single GPU, you are going to be limited to action spaces much smaller to the ones occurring in Chess and Go. AlphaZero.jl could likely tackle such games given enough cloud-computing credit but honestly, it has not primarily been designed with this goal in mind.

A more realistic way to use AlphaZero.jl with games of complexity similar to Chess or Go is to bootstrap it with a policy that is already decent. Such a policy could be learned using supervised learning from human games for example. In this case, AlphaZero.jl could probably enable you to turn a decent policy into a really good one without an insane amount of compute. I've been interested in someone trying this out for a while.

More generally, what you have to keep in mind when thinking about AlphaZero is that learning policies from scratch via random exploration is insanely wasteful and expensive compute-wise. It has the advantage of generality and being able to scale and leverage huge amounts of compute. However, solving games such as Chess or Go with AlphaZero from scratch requires a staggering amount of compute that is inaccessible to most (and will likely stay this way for a long time).

The honest truth is that outside a small number of games with tiny action spaces (Connect Four, Othello, Reversi...), applying AlphaZero naively to learn a policy from scratch over the full action space is going to be overly expensive for most people. In fact, even with Google-scale compute capabilities, many games are still going to be completely out of reach. The only way out if you are interested in those games is to combine AlphaZero with other methods and/or leverage game-specific knowledge to bootstrap from decent policies and engineer more tractable action spaces. But doing so requires work and insight and will never be automated by a push-button library.

@smart-fr How large exactly is the action space of ReKtaNG? Did you use any game-specific trick to make AlphaZero more tractable for your game?

@smart-fr
Copy link

smart-fr commented Nov 7, 2023

Thank you for your interest in ReKtaNG. Don't hesitate to try it out, you'll love it 😊 (and even more when it's fully "gamified" in the future).

The size of the action space is 2048, between go (362 according to this source) and chess (4672 according to this source).

But indeed, I use a pretty simple heuristic to filter out most theorically possible actions at each turn: an AlphaZero.jl agent retains only the 128 most "impactful" legal actions. All legal actions from a given piece have the same level of impact, which is a function of the piece centrality on the board and its perimeter in contact with opposing pieces.

This way I could start from scratch with a totally naïve agent, and, despite a quite large action space, train it up to a fairly good level with just a few iterations (less than 10). After that, I continued training for a total of 62 iterations, and observed improvements in the agent's level for each iteration when the network was updated. This happened last at iteration 50.

@jonathan-laurent
Copy link
Owner

@smart-fr This is a very good example of using game-specific knowledge to make AlphaZero tractable when a naive application wouldn't be. Thanks!

@A-Cepheus
Copy link

Yes, just like Gomoku, even if the board is very large, the legal actions can be limited to two squares around it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants