Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite without slice-deque (assertion failed at src\reader.rs:553:13 when running example) #17

Closed
Sopel97 opened this issue Oct 1, 2019 · 15 comments
Labels

Comments

@Sopel97
Copy link

Sopel97 commented Oct 1, 2019

I wanted to benchmark this against my implementation to have a baseline and I hit problems when running the examples.
running on windows
rustc 1.38.0 (625451e37 2019-09-23)
cargo 1.38.0 (23ef9a4ef 2019-08-20)

I compile with cargo build --examples --release
Then I run with cargo run --release --example validate -- "c:/dev/chess_pos_db/data/lichess_db_standard_rated_2014-12.pgn"
And I get an error

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `65536`,
 right: `16384`', src\reader.rs:553:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
error: process didn't exit successfully: `target\release\examples\validate.exe c
:/dev/chess_pos_db/data/lichess_db_standard_rated_2014-12.pgn` (exit code: 101)

All examples have the same issue. I also tried running the binary directly but no difference. Looks like the buffer is not being filled properly? My knowledge of rust is literally 0 so I can't do much here.
I also tried increasing the MIN_BUFFER_SIZE 4 times, then the assertion passes but the program hangs, no disk read happens at all.

@niklasf
Copy link
Owner

niklasf commented Oct 3, 2019

Thanks for reporting. It looks like I am making (and asserting) some assumptions that do not hold on Windows.

It will be a while (maybe a week) until I'll have a chance to debug this. I'll also make sure to add CI for Windows.

@Sopel97
Copy link
Author

Sopel97 commented Oct 17, 2019

Any progress on this? I would appreciate even any ideas/pointers as to what may be the issue here.

@niklasf
Copy link
Owner

niklasf commented Oct 19, 2019

Ok, so it looks like we can simply relax the assertion in question.

There is a related remaining soundness issue, because calling tail_head_slice() on a not-fully initialized SliceDeque might be insta UB. I'll try to figure that out before making a release.

@Sopel97
Copy link
Author

Sopel97 commented Oct 19, 2019

Thanks. Now it seems to be working. Though there is still something wrong. It takes a few minutes for a ~100MiB file (lichess 2013-01). I'm doing exactly what I did before. stats runs just as slow. Full cpu core is being used. looking at disk usage it's 50% reads 50% nothing. Totalling around 300KiB/s average.
Can you reproduce it on windows?

@niklasf
Copy link
Owner

niklasf commented Oct 19, 2019

Wow, that's horrible. It'll probably be (at least) another week before I can get my hands on a Windows machine again.

@Sopel97
Copy link
Author

Sopel97 commented Oct 19, 2019

No worry, it's not urgent.

@Eugenio-Bruno
Copy link

Hi,

I'm running into the same problem. Sorry to nag you about this, but any progress or workaround?

@niklasf
Copy link
Owner

niklasf commented Oct 29, 2020

No, sorry. I think a proper fix might require a rewrite with an entirely different approach. I want to do that, but no promises on the timeline.

@niklasf niklasf changed the title assertion failed at src\reader.rs:553:13 when running example Rewrite without slice-deque (assertion failed at src\reader.rs:553:13 when running example) Jan 1, 2021
@niklasf niklasf added the bug label Jan 1, 2021
@thesw4rm
Copy link

thesw4rm commented Apr 14, 2021

Hi, I'd like to also mention the issue of using SliceDeque. Current issue I'm facing is getting it to work with websockets, because SliceDeque is built to modify memory directly which is kinda hard to do when your code is essentially getting embedded in a web browser.

Code looks not terribly written (thank you), but can you give any insight into why the SliceDeque it is needed for parsing the PGN? I only found one instance of it being defined here

/// Internal read ahead buffer.
#[derive(Debug, Clone)]
pub struct Buffer {
    inner: SliceDeque<u8>,
}

in parse.rs

Everything else is just defining as a generic type.

From looking at the usage, looks like you're just using it to incrementally read in larger buffers from the PGN file. Can you a bit more into why SliceDeque instead of something like simple vec! before I start delving into changing stuff and of course pull requesting this back once I'm done? Would really appreciate it!

@niklasf
Copy link
Owner

niklasf commented Apr 14, 2021

The idea was that it's very efficient as a ring buffer. Let's say the buffer content is

|_______________ABCDEF|

where _ is some old/invalid/uninitialized content. The tag or whatever is not complete yet, so we have to rotate

|ABCDEF_______________|

and can then continue to read into the buffer

|ABCDEFGH_____________|

With SliceDeque this is O(1), with Vec moving the contents to the start of the buffer is O(n). But whether or not it is really fast, is unfortunately platform specific (which I didn't realize when implementing this), and some assumptions about SliceDeque internals (like the assertion) are also fragile.

@thesw4rm
Copy link

So I'm thinking there's a theoretical approximate upper bound to the size of data we will be reading. If you take 24,000,000 games for example, I'm guessing the file size won't be anywhere over 5-10GB. If we consider that modern SSD have write speeds of 500mb/s-1000mb/s, O(n) instead of O(1) might add maximum about 10-30 seconds. Maybe my calculations are completely wrong, and I'll run an actual speed test once I finish porting the code.

Thanks for the explanation. In that case, what would be the Vec equivalent of move_head and move_tail? Is it simply rotate_left and rotate_right or is there anything else that has to be done?

That's the only one I can't figure out at the moment

@thesw4rm
Copy link

Alternatively, why not use read_line from built in Rust file operations? It doesn't allocate a new String every time because you pass in your own.

@Sopel97
Copy link
Author

Sopel97 commented Apr 14, 2021

This is a high performance parser. Every copy is a measurable pessimisation. Reading line by line is not even remotely viable.

Though in my parser I do keep 2 buffers. One being read into asynchronously from a file, second being processed. When nothing more is there to be processed (less than 1 game) the leftover is moved to the begining, wait for async read to complete, append the read data to the processing buffer. While it involves a one full copy of the read data it's ok. At least it's simple and works.

@thesw4rm
Copy link

thesw4rm commented Apr 14, 2021

That seems a bit inefficient if minimising copies is the idea. Are you not copying from one buffer to another unnecessarily? You can just use one buffer that acts like a queue and block the read call if it needs to be filled more.

For my purposes, I basically copied your parser and other relevant files and am replacing the SliceDeque with directly reading the BufReader . I also put the responsibility on the caller to create a Seekable object to pass in so they are free to use Cursor, File, etc . Curious where the inefficiency would be if I did that?

I am not planning to work with a very large number of games so overall the performance hit will be negligible for end user.

@niklasf
Copy link
Owner

niklasf commented Oct 3, 2022

Fixed via #30.

@niklasf niklasf closed this as completed Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants