Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer Improvements: add adversarial filesystem test variants for disk buffer v2 #10324

Open
Tracked by #9476
tobz opened this issue Dec 7, 2021 · 2 comments
Open
Tracked by #9476
Labels
domain: buffers Anything related to Vector's memory/disk buffers domain: reliability Anything related to Vector's reliability type: task Generic non-code related tasks

Comments

@tobz
Copy link
Contributor

tobz commented Dec 7, 2021

Part of the Buffer Improvements RFC (RFC, #9476)

As part of the work on #10143, we opted to defer adding tests which exercise the new disk buffer implementation by using an underlying filesystem that was "adversarial", or had the ability to inject errors that might normally be rare in practice for the purpose of ensuring that we handle, and catch, these errors.

What this should likely be a first pass is some variant of buffer_perf, where we point it to store its data on the aforementioned adversarial filesystem, and then track what writes are successful vs not successful, and what we see from the reader side, and if the two match up. Essentially, the goal becomes: if we got no error back when writing and flushing a record, we should be able to read it back, or correctly detect when the data was modified outside of our control, and be able to account for every single attempted write.

One option could be to explore CharybdeFS as the adversarial filesystem implementation, as it is relatively well-maintained, should be battle-tested as it's written and used by ScyllaDB, and is programmatic controllable via Thrift RPC which should be easy to integrate into a test harness.

Another option that is slightly more integrate-able would be to use something like fuser to allow creating and controlling our target filesystem entirely in Rust. This could allow for writing the entire chunk of testing code in Rust, and potentially as a single binary that could be then fed a seed for the RNG used to choose which FS operations succeed or fail.

@tobz tobz added type: task Generic non-code related tasks domain: buffers Anything related to Vector's memory/disk buffers domain: reliability Anything related to Vector's reliability labels Dec 7, 2021
@tobz tobz changed the title chore(buffers): add adversarial filesystem test variants for disk buffer v2 Buffer Improvements: add adversarial filesystem test variants for disk buffer v2 Dec 7, 2021
@tobz
Copy link
Contributor Author

tobz commented Dec 8, 2021

As a data point: we've encountered a few assertions in specific tests where the fact that we're using real file operations when under test leads to indeterminism around how many times a future has to be polled before it reaches the await point we expect it to, and so on.

An adversarial filesystem would help us root out this indeterminism more easily, so that the tests could be written more robustly. Things that are fast and never fail when run locally are more easily triggered when run in CI, but even then, CI is not always slower, so having a filesystem we can make very slow would potentially be useful for developing more robust tests.

@tobz tobz mentioned this issue Jan 10, 2022
18 tasks
@tobz
Copy link
Contributor Author

tobz commented Jan 18, 2022

Chatting about this a little more with @blt, the plan as laid out above is likely unworkable for a single reason: unit tests are meant to assert deterministic scenarios, whereas adversarial filesystems are more amenable to black box testing.

While we should test something like the buffer_perf example binary against an adversarial filesystem, unit tests themselves require too much control. We would essentially be using the filesystem as a way to control what I/O operations respond with, which is useful, but not in the sense of what happens when throwing arbitrary/randomized errors back in terms of committed writes actually being written or not.

Thus, I'm going to transform this issue to encompass what sort of test we should run on top of an adversarial filesystem -- likely something as described above, or as described in the CharybdeFS documentation itself -- and a new issue will be created for tracking work to actually use property-based testing, along with some code refactoring, to meaningfully control both the input operations (read, write, flush, etc) and the way the "filesystem" should respond, and finding sequences of filesystem operation responses that invalidate those expectations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: buffers Anything related to Vector's memory/disk buffers domain: reliability Anything related to Vector's reliability type: task Generic non-code related tasks
Projects
None yet
Development

No branches or pull requests

1 participant