-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About reading Fox Go #752
Comments
|
Thank you for paying attention to my post. By the way, KaTrain v1.5.0 can open the above file. However, you cannot turn off analysis by pressing the spacebar while analysis is on. I'll post this issue on KaTrain's Issues. |
The above SGF is opened successfully on my PC (Linux). So it is difficult for me to fix this issue, unfortunately... |
I did some research to see what's different between when it can be read and when it can't. It seems that the important point is which engine is set when reading the sgf file. ・ Katago-v1.4.2-cuda10.1-windows-x64 If these are set, the sgf file with evaluation value and explanation will be read successfully. ・ Katago-v1.5.0-cuda10.2-windows-x64 If you open an sgf file with evaluation values and explanations with these set, lizzie will freeze. I feel that "cuda 10.2" is the cause, but I don't know at all because I don't have any knowledge. After successfully opening the sgf file with evaluation values and explanations, even if you change to the above two engines in the engine settings, you can analyze normally. |
From your hints, I noticed more than 5000 GTP commands were sent needlessly to the engine when the above SGF was loaded. I doubt they may be stressful for the engine. So I made a quick & dirty patch (7f782ac) to stop them. Would you try it? https://github.com/kaorahi/lizzie/tree/fix752_fox (I've not tested it well and I'm not sure whether it works correctly.) |
This thread makes red flags come off for me. If that fixes it, I'd caution strongly against just concluding that this is the problem and then thinking everything is fine after no longer sending the commands. If KataGo or Leela Zero receive 5000 GTP commands, they will still parse them correctly and handle them correctly, one by one. If Lizzie were to then handle the responses correctly, one by one, then there would be no problem. Computer programs send each other thousands of lines of data all the time with no problem. So if no longer sending 5000 useless commands fixes it, then probably there is still an underlying bug - a race condition in Lizzie's handling of commands and responses that is triggered by a large volume of GTP commands and responses that get buffered up in various ways. Presumably Lizzie has a race condition that causes it to lose track of the message and response pairing, so maybe it ends up miscounting and waiting forever on a response, never realizing it was already sent, or something like that. If this is the case, then even though no longer sending these commands is desirable, before eliminating them one would also want to find and fix the true underlying bug that is causing Lizzie to have the problems. Simply stopping the 5000 commands won't solve this - if there is still the underlying a race condition, that race condition will emerge in other situations too besides this one, so now is the time to debug it, when it is reproducible. |
I'm not using Lizzie by myself now and don't have enough motivation of tough debug... But I'd like to know what is happening at least. (really freezing? or too slow?) So, @hope366, would you try...
What do you see on GTP console? If you repeat the experiments, do you get the same result every time? |
If you load SGF without displaying GtpConsole, it freezes and nothing is accepted, but if you load SGF with GtpConsole displayed in advance, the following will be displayed in GtpConsole and SGF itself will also be loaded. Succeeded.
|
Strange...
Doubtful "command queue" is removed in bbb910a. |
thx. If you are willing to help my further debug:
If 1 is "no", you may need to try 2 and 3 several times for confirmation. I'm trying a wrapped engine with artificial random lags. But I still cannot reproduce the issue on my machine (Linux, no external GPU). |
Yes, it always succeeds with the GTP console and always freezes without the GTP console. bbb910a a30bce0 |
thx. Then never mind bbb910a. As a start point, it's a good news that you can reproduce the issue stably and I can get information from the title bar in a30bce0. Next, I'd like to find where the program is hang up. Would you try https://github.com/kaorahi/lizzie/tree/for752_debug2 (c1304ed) in the same way? |
|
nice! You captured the functions where the program was stopped. The UI thread was in sendCommandToLeelaz(). The reader thread, that receives outputs from the engine, was in parseLine(). Then let's specify the exact points in the functions with https://github.com/kaorahi/lizzie/tree/for752_debug3 (f04bb62). |
|
Thanks to your screenshots, we have reached the point. It is unexpected result for me. @lightvector, would you give me any advice? In Leelaz.java, the reader thread seems to be stopped between these two
But why did the UI thread hang up between these two
|
Yeah, it sounds like it might be some sort of thread deadlock or buffer issue, rather than a race (although maybe it's a race that causes a later deadlock). I haven't done a dive into Lizzie's code in a while, but I could take a look in the next day or two, certainly. |
thx. I guess...
|
I have not dug into the code but based on your description it sounds like this is the fault of the UI thread, it sounds like the UI thread is the one that is implemented incorrectly. It is usually not a good idea to hold a lock at the same time as you could be blocked indefinitely on something else. Conceptually, if we zoom out and think about what we would like to happen in this situation: we would like the UI thread to just wait and to not block anything, just wait. The reader thread should continue, the KataGo response processing should continue, and at the end, the UI thread can be updated based on the results after all the commands are processed. So it is a bug if the UI thread is capable of blocking the reader for more than very brief periods of time. Can we make the UI thread not block the reader thread while it waits? The right way to do this might depend on the code structure - whether it is to split the fields of the object and use separate locks for reading-related fields vs writing-related fields, or to make the UI thread loop on a condition variable so it can drop the lock and sleep while it's not done, or something else similar. |
If the above trilemma guess is correct, we will be happy if we can move these two lines
in I am considering two plans. Plan A: Plan B: If possible, Plan A is cleaner and needs smaller changes of the existing codes. Plan B requires careful rewrites of various parts, that can be broken very easily by casual modifications in future developments. At present, Plan A failed because I know almost nothing about Java. Plan B failed because it needs many rewrites together with careful pursuits of the existing codes. Unfortunately, there are several functions in
and I am not sure what these In my current opinion, I would prefer my quick & dirty patch rather than Plan B because Plan B seems too fragile for future changes. The codes in Lizzie are not controlled strongly. Various developers (including me) have written various things as they like. As for Plan A, I do not have the common sense of Java to decide whether it is feasible. Can I hear your comments? Personally, I am in "maintenance phase" for Lizzie (though I am not an official staff of this project). I am willing to fix issues as far as they can be solved by small changes. I do not have motivations of large rewrites. |
Professionally, it seems that a major modification is desirable, and I would like to see its realization, but I am satisfied because it is very convenient because the SGF can be loaded successfully by using quick & dirty patch |
@hope366, thanks for many experiments. Your reports are essential for me to understand the problem. This note seems to help implementation of Plan A, though I've not tried it yet. |
I don't know how to use java myself, so I'd be very grateful if you could just show me a fix. |
I'm trying to implement plan A. @hope366, could you try https://github.com/kaorahi/lizzie/tree/for752_planA1 (7d3a4d3). (I hope a code review by Java programmers. I know nothing about Java.) |
|
@kaorahi - Nice work! If a simple change like this, to move the code to a writer thread fixes the deadlock cycle, then this is excellent and I believe your overall approach looks good. As a review, I can recommend some details to make sure there will be no problems from this kind of change. I do not have as much familiarity with the Lizzie code as you do, so I do not have a good understanding of the dependencies and what different things synchronize on or block what other things. However, this looks like a very locally-understandable and simple change so I think I can make a recommendation. Currently there is added synchronization on two different things, but the pattern for code like this (in any language, not just Java) is generally:
This way of doing the pattern removes any risk of races or spurious wakeup problems. if the condition is already true, the writer thread will never sleep in the first place. If the condition is false, then it will wait, and since it is in a single synchronized block, there is never a possibility of writer thread thinking condition is false, just about to wait, but suddenly the the notifier comes in and makes it true, sends the notify, but it happens before the writer has actually started waiting so the writer never gets the notify. Based on this, I am momentarily soon leaving some comments on your commit that I hope will be helpful. Let me know what you think! |
Okay I finished my review and comments on the commit: 7d3a4d3. I'm not an expert in Lizzie's internals, and the moment I'm not in a ready position to easily build and test Lizzie. So I'm mostly I'm just basing my comments around the general understanding I have of how to write good multithreaded code, and reading your commit along with Lizzie's source. So it is possible you find a bug or misunderstanding, or the code I suggested doesn't work for some reason. Hopefully, however, the comments I have also help make the principles and understanding clear behind these suggestions. If you try the suggestions and it doesn't work, and you think the code is too complicated, then okay, sure just do the quick and dirty patch. :) That would be the approach if you are just giving up on the code and saying it is too messy and unfamiliar to maintain, and needs to ultimately be rewritten. In that case, it would be indeed be legitimate and valid to leave some bugs in there and just try to mitigate them, and wait for featurecat or someone else to come along eventually with something new and shiny to replace it entirely. :) |
Thanks a lot for your kind review! As a programming lover, I enjoy learning a new tesuji from this chance :) Here is the updated version: https://github.com/kaorahi/lizzie/tree/for752_planA3 (0a047eb). @hope366, would you test this? |
planA1 ( By the way, I asked Lizzie's Issues about two new posts and a pull request about GONGJE. |
thx. Then I'll make two pull requests: the revised plan A as the fix of this issue, and the quick & dirty patch for acceleration of loading of huge SGF. Yes. I've noticed your posts. But, unlike 62032e4 e0a49d4 etc., your new issues are not simple mistakes. They are requests of improvements with additional coding actually. I'll send patches if I can find easy ways to fix them. |
Thanks for making lizzie even more useful and easy to use 😄 I'm currently implementing these two separately, is it possible to implement and use them at the same time? |
Yes. You can apply both of them at the same time. |
thx a lot. |
I tried to verify each of the created pull requests. Only this item could not be confirmed. |
thx for the check. The message is shown as the screenshot in #747 (comment) together with the engine's error message in GTP console. We should've implemented this earlier. |
Fox Go game records that contain AI analysis and commentary cannot be read by lizzie. Is there any improvement that makes this possible?
The text was updated successfully, but these errors were encountered: