-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Train for maia:2000 maia:2100 maia:2200 #19
Comments
Making higher rating Maia models is somewhat challenging as there are not enough games for us to use the same sampling methods we used for the published models. We have done some experiments with different strategies but the results have been unsatisfactory. Also as the ratings increases the players' moves approach those of Stockfish/Leela. I'd also like to note that the goal of our project is not to create the best chess engine, we are trying to create models that can be used for learning/teaching. |
Thank you redmcy for your feed back. I understand the argument of not having enough data for training. In any case, if it was possible to train them for the proposed elos, you should have to take in to account that they will have also a great value. Think that leela and stockfish are 3500+ engines and that having a human like engine of a similar strength can be very valuable for training purposes. 2000- 2500 are the more common elos in OTB amateur chess, that's why it would be valuable to have human like engines of such elo ratings. Thanks for your great work! |
We need about 12 million games, with good endgames so not fast time controls. Mixed player ratings was one of the first things we tried after the first paper, it gave much weaker results. |
Do we have any performance values using maia9 for example with 16, 32, 64, 128, etc. tree search values instead of 1? |
Figure 5 of the paper shows 10 rollouts, but we did others too with similar results |
If number of games is a problem for >1900, why not use approx conversions and also import games from chess.com, FICS and ICC (the later two go back 25 years although ratings may have deflated / inflated during the course) into the training set. It might need a bit of tweaking to get it right, but I suspect the drop off of games between 1900 and 2000 is not that much that taking additional games from other servers into account should at least get you 2000 if not 2100. However it does raise a bigger question which I think is an interesting one and why you might not get such good results above 1900, but worth investigating for both AI and chess. The higher the rating the more balancing intuition with calculation takes place. A master will say things like I feel this is the right move, or I'm not worried about that move, particularly at faster time controls which use of the clock is a big factor. When it does get a little more complicated they calculate, often 2 or 3 moves ahead (and as the time control increases conscious calculation becomes more a factor). By using depth 1 and a neural net, you're simulating which move would be played if calculating ahead were disallowed (although taking tactical patterns into account), and when you start going higher this won't predict the move made when calculation would have been made. This works well to simulate sub-2000 play, but not so much above that I believe. Thus to make a representative 2000 and above player (and there is a lot of interest here from improving players looking for a sparring/training opponent) you need to have an engine that balances choosing moves using intuition vs calculating when necessary. Using 'unnecessary' (or always to depth) calculation will cause evaluation of moves/positions the human wouldn't consider (even if it were finding objectively the best move). This might be impossible to train because humans don't really understand intuition well and the data where the player calculated or not isn't available except by guessing on how much clock time was used (not always an accurate indicator of tactical complexity), but it might be possible via some heuristic which sets the depth based on how tactical the position is and balances the tree / depth with the skill level being represented. |
I did some experimenting with lowering the quality standards and didn't get good results. Using games from outside Lichess is tricky as most other servers don't have free archives available. I also can't go violating the terms of use for the sites with a scraper even if the data are available since I'm doing this as a part of my PhD. I think your second point is interesting and we have a student looking into something similar since May, we frame it as more of an inverse RL task. I do alos think your concerns about depth of search are a bit off, the neural network could be doing some kind of search internally. In fact the model is designed to extra information sequentially so is almost certainly do a some kind of search. So depth 1 search doesn't mean the same thing to Maia/Leela as to stockfish or a human. |
It's possible to download games from chess.com (by player at least, I'm not sure about by rating) as well as FICS has a database ficsgames.org. FICS' database goes back more than 20 years - smaller numbers, but if you add it to the set it may get it above the number you need . Probably if you asked nicely at ICC (assuming there isn't already a download) they may be able to help you too and would have more games around the 2000 mark. The only problem would be standardising approximate rating across servers for which there are a number of surveys out there.
David
Sent from Yahoo Mail on Android
On Wed, 15 Sep 2021 at 9:07, Reid ***@***.***> wrote:
I did some experimenting with lowering the quality standards and didn't get good results. Using games from outside Lichess is tricky as most other servers don't have free archives available. I also can't go violating the terms of use for the sites with a scraper even if the data are available since I'm doing this as a part of my PhD.
I think your second point is interesting and we have a student looking into something similar since May, we frame it as more of an inverse RL task.
I do alos think your concerns about depth of search are a bit off, the neural network could be doing some kind of search internally. In fact the model is designed to extra information sequentially so is almost certainly do a some kind of search. So depth 1 search doesn't mean the same thing to Maia/Leela as to stockfish or a human.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
|
The Chess.com per player downloads is what I was alluding to with the scraping comment |
I found some resources you can scrape off of(I already wrote this here: #43 (comment)) FICS Games: a free resource that offers a large selection of matches and allows you to sort by 2000-2199 ELO, 2200-2399 ELO, etc.- https://www.ficsgames.org/ Chessbase: a paid service that provides access to millions of matches played by higher ELO players- https://database.chessbase.com/ Lichess: another free option with over 8 million matches, including higher-elo games, sorted by month- https://database.lichess.org/ Chesstempo: offers over 2 million searchable games and allows you to sort by min and max ELO in advanced settings- https://old.chesstempo.com/game-database.html EDIT: Some other resources I discovered- https://www.kaggle.com/datasets/datasnaek/chess |
Someone made a higher ELO version of maia on Lichess. Link is below. |
We can't easily combine games from different sources since Elo is not consistent between them. We also need about 12 million games after filtering so most of those sites are much too small. I also need to use sites that have licenses that allow use. I have been running experiments with Lichess, since there are many more games since the last paper, but I don't have anything to publicly release yet. Also, keep in mind this is an academic project. The goal is not just to release better models, the goal is to do unique things and release new ideas. So releasing new models will need to be part of a larger project. |
That appears to be our Maia-1800 weights using MCTS search. As we showed in the original paper, using search reduces humanness of the engine. So that's closer to a weak Leela model than our Maia models. |
Hey :) |
There was some working showing that a modified version of MCTS increases humanness, https://arxiv.org/abs/2112.07544. Estimating Elo is difficult to do, as it's not a fixed number, it's based on the community's interactions with the player. Thus even comparing Elo between different chess servers is non-trivial. I have run experiments that showed that the KL-regularized MCTS increases winrate, but I can't directly calculate an Elo for the resulting model. |
I did some (quick and dirty) experiments to see how Maia behaves when using MCTS and found that each depth increases elo by approximately 400 points. now if you are brave you could assume that Maia1600d2 ≈ Maia2000 and so on... In terms of humaness I assume that a Maia on depth 2 or 3 still plays more human-like than a Stockfish on a similar strength (if you can even achieve that). I could be wrong though. If there is interest I could provide more information. |
Not to worry! I'm currently training a Maia model targeting an ELO rating of around 2500. I previously trained Leela Chess Zero using supervised training data from the Lichess Elite Database, which has games ranging from 2100 to 2500 ELO rating. The net, EliteLeela, can be found on Lichess as a bot. However, I haven't run the bot in a very long time. The great news is that the Lichess Elite Database has a total of over 19.7 million games, so it should be good to use for training. I'll be letting my computer run for the next couple of days for it to train the model. Once it's done, I'll do the test match and see what rating it could be at using Maia 1900 as a baseline. @reidmcy I'll send you the results and model when it's done, if you want. |
@CallOn84 Im very interested in a stronger-than-1900 human like engine, and I want to encourage you to continue your work. Im simply a user of all these machine learning / AI tools but a programmer by day so perhaps I can help in some way. Just one question, when you say "ELO rating of around 2500" do you mean lichess rating? Im not trying to be pedantic, its just lichess doesn't use ELO, the rating there is Glicko2 but many players do have FIDE titles and its not difficult to match their FIDE ELO to their Lichess rating |
Yes, I meant 2500 Glicko2 rating, which is around 2000 ELO rating. The issue I'm facing right now is training data, as there isn't really much training data to make this work. On the Lichess Open Database, I was getting on average of 100,000 blitz, rapid and classical games that was around the 2500 Glicko2 rating area, which isn't enough as I need 12 million games+. Now, this could be “fixed” by supplementing lichess games with OTB games and chess.com games, but there's a bit of an issue when it comes to finding them. Chess.com database games aren't open like lichess is, and OTB games requires extensive research of 2000 ELO-rated players to download their games. So, that's the current issue right now. Otherwise, the training itself isn't that hard to do, except that it's on Linux and I hate running Linux with a passion. |
Surely there are over 100k quality games played on lichess through all the years available, how many games and what criteria do you need? I can download and filter from https://database.lichess.org/ Also, if you point me to a doc I'll be happy to run this on linux (I love running linux with a passion 😂 ) |
There are over 100k games that are around 2500 Glicko2 rating, the issue is that there isn't 12 million of them. Trust me, I spent hours going through each year and only getting aorund 100,000 games per year. |
well I just downloaded last month's archive and found 323k with this criteria:
I filtered them with pgn-extract Now, I know Im guesstimating but it doesnt seem unreal to me to find 3.5m~4m games per year, and while the further back we go the less games are found the archive goes all the way back to 2013, if that criteria (elo and time control) satisfies the need I really think its feasible to find 12m games in the archive |
Did you filter out the bullet and hyperbullet games as well? |
Yes, it only includes games that start with 3 minutes or more, so Im actually excluding games like 2+10 which is -according to lichess- blitz. |
You also need to remove games where either player is a bot and where there aren't enough moves, after dropping low clock moves. Also, are you limiting the difference in rating between players? We required both players to be of similar rating, i.e., both in the same bin. I've found that doing wide ranges of Elo leads to worse results, i.e., a 2100 only model performs better than a model trained on 2100-2500 rating players even though the training data is a strict superset. |
Did you use pgnextract to get your games? If so, how did you remove player bots, not enough moves, or dropping low in clock? |
I wrote my own parser, there's an early version of it in this repo. |
On a side thread, what would happen if say you trained 1900 with a large set of games, but continued the training with say a smaller set of games of 2200 rating. Would this increase the rating (maybe not quite to 2200), but simulate someone improving from a base level? This could be another avenue of generating a stronger human-like engine. And what if this was done in steps of various plateau ratings say 1100, 1300, 1700, 1900, would this produce an even more human-like engine?
David
Sent from Yahoo Mail on Android
On Fri, 22 Sept 2023 at 8:37, Reid ***@***.***> wrote:
I wrote my own parser, there's an early version of it in this repo.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
Hello @reidmcy I only saw the rating and time requirement but lets formalize the requirements, Im happy to further filter and try to get a useful data set, would you agree these are accurate:
Anything else? anything to change? |
I think that all, really. Try to make the rating range between 2500 and 2599, so we can get all players within the 2500s. |
@wickeduk That is unlikely to perform very well. You're changing the target distribution of the NN so you're only going to be able to reuse some set of the weights. NNs don't learn like humans, generalization is very difficult for them. We saw this with our fine-tuning paper, the models all tend to revert back to their base Maia when they're not run on positions similar to those the target player played. |
@purefan You'll want to test the effects of the layer/block/SE counts. One reason we believe the 1900 rated model is below 1900 on Lichess is that it's not complex enough to understand certain moves. In particular you'll likely want at least 16 (8*8) blocks since that's the minimum for the CNN to pass information across the board multiple times. |
Thank you for your input @reidmcy but Im just a backend developer, havent done any ML/AI myself and dont really know what you're talking about but Im happy to try and get some useful data for you experts to build a better Maia. I am making progress in filtering the games with the criteria we talked about, hitting some limits on the lichess API but making progress for sure, hoping to post some data this coming weekend and see if its even feasible to get 12M games :-) |
I can handle training 😊. Just give me the games when you're done, but I have doubts that it will reach to over 12 million. We can supplement with OBT games, but it's going to take a very long time finding 2000 rated games. |
Yeah... I have a pretty up-to-date database from The Week In Chess but they include chess.com events like Titled Tuesday and I suspect the rating in those events is the chess.com rating. Will keep hammering at the lichess database and hope for the best. Another interesting metric is the rate at which valuable games are played, maybe we dont get enough quality games up to this month, but in a couple of months we might reach 12M |
Just wanted to report on the progress, so far I have identified 387 bots and 57196 humans, processing from last months archive backwards to May of this year. Will post again when there's more progress (I keep poking the lichess API every day but hit throttling limits and then I have to wait) |
Hello again! Can someone please test this file? https://maia-help.s3.eu-north-1.amazonaws.com/lichess_2023_01-08.pgn.zip I am having a hard time opening it in Scid vs PC, is it properly filtered? how many games are there? |
Using the classic Notepad++ trick, there are a total of 2,024,918 games. There are 5,011 games that have the “BOT” title for either White or Black or both. There is an issue regarding the ELOs, though. There is a huge rating gap where you have one player that has a rating over 2500 and the other player has a rating under 2500, like 1500 for example. You want the rating different to be as close to each other as possible. As reidmcy has mentioned previously, doing wide ranges of ELO leads to worse results. I did train Maia using the Lichess Elite Database and got a net that performed worse than the existing Maia nets, so I can vouch for that. You're probably going to have to look at doing this again and refine the search parameters. In the meantime, a friend on Discord also has a couple of large PGNs of chess.com games we can use as well. |
How many games are required to train maia 2500 ? |
reidmcy quoted 12 million games are needed. However, based on my experience training Elite Leela, 12 million games is more of a minimum than anything else. With any neural network, the more quality data you have, the better until you start to get into over-craming issues. |
After doing some pgn-extracting to try and get some 2500-2599 games from lichess, I'm getting more confident that a Maia 2500 will unlikely ever be produced. The reason is simply because there aren't many games to make the training of one possible. With rapid, blitz, and classical, only around 2,000 players play across the time controls weekly; the longer the time control, the less weekly player count. Of course, adding OTB and Chess.com games could help, but I don't think it would help enough to get close to the other 12 million games I need to get this working. So, sadly, it's likely that Maia 2500 would not be possible. However, the other strengths do look possible. I wouldn't do the Maia 2000 since it doesn't sound like an interesting setup, but the Maia 2200 could be worth looking into. We'll see how it goes. |
I hardly understand why so many games are required |
The best way to think about this is when we learn a language. From newborn, it took us several years and loads of words, sounds, pictures, etc., to read, write, speak, and understand a language since, as a newborn, we had no inherent knowledge about the language in front of us. So, take this analogy and apply it to neural networks. A neural network is like a newborn; it doesn't have any inherent knowledge about what it's playing. It needs data to understand and learn what it's playing, just as I learned chess by watching analysis, playing games, learning openings, etc. The more you get exposed to chess material, the more proficient you become at the game. Furthermore, you'll also be able to answer complex, diverse, and unbalanced situations. That's why a neural network needs loads of games. One game isn't going to teach a neural network all the possible scenarios, nuances, or variations of a game of chess that it needs to. The neural network isn't just learning about what the chess rules are; it's learning nuances as well. And that's not even considering that it's learning an amount of GM-level knowledge that would take 15-20 years in a couple of hours or days. |
With respect... The reason the network needs so many games is a better method hasn't been found :) One network, all the concepts stored in a single map fed a set of inputs to produce a score. It's convenient to train computers with and produces extremely strong engines (because we're incapable of encoding to a computer exactly how to play 3500 level chess), but it's unlikely humans have something similar internally because no human has seen millions of games to start with. Encoding concepts which we can do logically is also I imagine a difficult problem (e.g. Often prefer a bishop above a knight unless the position will remain closed, the bishop will be useless to control key squares or there is something concrete in the position that favours the knight) - that's pretty much how a 2000 level player thinks when faces with a B vs N exchange and will reason and use maybe 5-10 examples seen before where one was superior to the other. Logically this would cover thousands or not hundreds of thousands of permutations and not perfectly accurate is a better approximation to a human making a decision at that level. If a better way of encoding tactical patterns, positional ideas, opening and endgame theory is found (maybe several smaller nets, maybe backed with heuristics with some sort of controller), with specially selecting the concepts there is no way you'd need millions of games or positions (10,000-20,000 games and fragments maybe) for 2000-2200 level. I suspect you'd also have a more realistic player that 'knows' specific openings and middlegames well, but flounders a bit in unfamiliar positions or is prone to certain types of errors due to lack of training. Training every single opening and middlegame permutation by masses of games isn't realistic nor probably will produce a realistic player (two different players will have different weaknesses and masses of games may patch this). Also mentioning time is completely irrelevant and apples and oranges.
Perhaps the next step for research is to look at ways you can work with the data you have (tons of tactical and endgame positions, well known opening lines and master games) and work out how one can produce a realistic approximation of a player at a certain (lower) level with that. Maybe a strong engine can act as a judge (after analysing thousands of games at the level) and decide what to train and what should be made flawed somehow.
David
On Monday, 5 February 2024 at 04:51:55 GMT, Viet-Anh Tran ***@***.***> wrote:
I hardly understand why so many games are required
The best way to think about this is when we learn a language. From newborn, it took us several years and loads of words, sounds, pictures, etc., to read, write, speak, and understand a language since, as a newborn, we had no inherent knowledge about the language in front of us.
So, take this analogy and apply it to neural networks. A neural network is like a newborn; it doesn't have any inherent knowledge about what it's playing. It needs data to understand and learn what it's playing, just as I learned chess by watching analysis, playing games, learning openings, etc. The more you get exposed to chess material, the more proficient you become at the game. Furthermore, you'll also be able to answer complex, diverse, and unbalanced situations.
That's why a neural network needs loads of games. One game isn't going to teach a neural network all the possible scenarios, nuances, or variations of a game of chess that it needs to. The neural network isn't just learning about what the chess rules are; it's learning nuances as well. And that's not even considering that it's learning an amount of GM-level knowledge that would take 15-20 years in a couple of hours or days.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
There is a possibility for a Maia 2500 by doing transfer training from a Maia 2200, so I might look into that once I experiment with training a Maia 2200 net. I've always wondered why there hasn't been a Moore's Law-type improvement for deep neural networks. What I mean by a "Moore's Law-type" improvement is a deep neural network architecture that requires significantly less data, but produces a net that has the same strength or even better than a standard deep neural network. As you said, this might be a future research opportunity for the many AI organisations out there. |
@reidmcy I sent you an email regarding one particular issue that I came across regarding the Lichess Database games that I've been gathering. Please reply when you're free. |
@jorditg Good news, I'm very close to training a Maia 2200 net. As always, I'll make it publicly available for download with these kinds of projects. |
@CallOn84 Thanks for all your hard work! Any updates on the training? |
It's been trained and has gone through the move-matching accuracy test. I would like to know the accuracy numbers, but I'm still trying to figure out a way to create a Python script to calculate that all. Simple testing against Maia 1900 through a policy tournament indicates that it's 38 Elo higher than Maia 1900. I'm not sure if that's right or not. |
One more idea for Maias. You could add an appropriate opening book by selecting games from a rating range from the lichess database based on game type. Thus you also then get authentic openings for the level and time controls irrespective of the training (which as said might cover mistakes from one player with accuracy from another).
David
Yahoo Mail: Search, organise, conquer
On Wed, 27 Mar 2024 at 9:07, Viet-Anh ***@***.***> wrote:
@CallOn84 Thanks for all your hard work! Any updates on the training?
It's been trained and has gone through the move-matching accuracy test. I would like to know the accuracy numbers, but I'm still trying to figure out a way to create a Python script to calculate that all.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
The Maia 2200 project is officially done. I finally figured out a way to have both csv.bz2 files read and compared to have their accuracy percentage calculated. I'll be posting the net, as well as the results, in my GitHub. @reidmcy, feel free to double-check my accuracy results if you have time. |
It would be interesting to have other more powerful versions having the same human-like style.
The text was updated successfully, but these errors were encountered: