-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite without slice-deque (assertion failed at src\reader.rs:553:13 when running example) #17
Comments
Thanks for reporting. It looks like I am making (and asserting) some assumptions that do not hold on Windows. It will be a while (maybe a week) until I'll have a chance to debug this. I'll also make sure to add CI for Windows. |
Any progress on this? I would appreciate even any ideas/pointers as to what may be the issue here. |
Ok, so it looks like we can simply relax the assertion in question. There is a related remaining soundness issue, because calling |
Thanks. Now it seems to be working. Though there is still something wrong. It takes a few minutes for a ~100MiB file (lichess 2013-01). I'm doing exactly what I did before. stats runs just as slow. Full cpu core is being used. looking at disk usage it's 50% reads 50% nothing. Totalling around 300KiB/s average. |
Wow, that's horrible. It'll probably be (at least) another week before I can get my hands on a Windows machine again. |
No worry, it's not urgent. |
Hi, I'm running into the same problem. Sorry to nag you about this, but any progress or workaround? |
No, sorry. I think a proper fix might require a rewrite with an entirely different approach. I want to do that, but no promises on the timeline. |
Hi, I'd like to also mention the issue of using SliceDeque. Current issue I'm facing is getting it to work with websockets, because SliceDeque is built to modify memory directly which is kinda hard to do when your code is essentially getting embedded in a web browser. Code looks not terribly written (thank you), but can you give any insight into why the SliceDeque it is needed for parsing the PGN? I only found one instance of it being defined here /// Internal read ahead buffer.
#[derive(Debug, Clone)]
pub struct Buffer {
inner: SliceDeque<u8>,
} in parse.rs Everything else is just defining as a generic type. From looking at the usage, looks like you're just using it to incrementally read in larger buffers from the PGN file. Can you a bit more into why SliceDeque instead of something like simple |
The idea was that it's very efficient as a ring buffer. Let's say the buffer content is
where
and can then continue to read into the buffer
With SliceDeque this is O(1), with Vec moving the contents to the start of the buffer is O(n). But whether or not it is really fast, is unfortunately platform specific (which I didn't realize when implementing this), and some assumptions about SliceDeque internals (like the assertion) are also fragile. |
So I'm thinking there's a theoretical approximate upper bound to the size of data we will be reading. If you take 24,000,000 games for example, I'm guessing the file size won't be anywhere over 5-10GB. If we consider that modern SSD have write speeds of 500mb/s-1000mb/s, O(n) instead of O(1) might add maximum about 10-30 seconds. Maybe my calculations are completely wrong, and I'll run an actual speed test once I finish porting the code. Thanks for the explanation. In that case, what would be the That's the only one I can't figure out at the moment |
Alternatively, why not use |
This is a high performance parser. Every copy is a measurable pessimisation. Reading line by line is not even remotely viable. Though in my parser I do keep 2 buffers. One being read into asynchronously from a file, second being processed. When nothing more is there to be processed (less than 1 game) the leftover is moved to the begining, wait for async read to complete, append the read data to the processing buffer. While it involves a one full copy of the read data it's ok. At least it's simple and works. |
That seems a bit inefficient if minimising copies is the idea. Are you not copying from one buffer to another unnecessarily? You can just use one buffer that acts like a queue and block the read call if it needs to be filled more. For my purposes, I basically copied your parser and other relevant files and am replacing the SliceDeque with directly reading the BufReader . I also put the responsibility on the caller to create a Seekable object to pass in so they are free to use Cursor, File, etc . Curious where the inefficiency would be if I did that? I am not planning to work with a very large number of games so overall the performance hit will be negligible for end user. |
Fixed via #30. |
I wanted to benchmark this against my implementation to have a baseline and I hit problems when running the examples.
running on windows
rustc 1.38.0 (625451e37 2019-09-23)
cargo 1.38.0 (23ef9a4ef 2019-08-20)
I compile with
cargo build --examples --release
Then I run with
cargo run --release --example validate -- "c:/dev/chess_pos_db/data/lichess_db_standard_rated_2014-12.pgn"
And I get an error
All examples have the same issue. I also tried running the binary directly but no difference. Looks like the buffer is not being filled properly? My knowledge of rust is literally 0 so I can't do much here.
I also tried increasing the MIN_BUFFER_SIZE 4 times, then the assertion passes but the program hangs, no disk read happens at all.
The text was updated successfully, but these errors were encountered: