Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid memory leaks #46

Closed
wants to merge 1 commit into from
Closed

Conversation

Felix-El
Copy link
Contributor

This commit replaces the threadpool crate with a handcrafted solution based on scoped threads. This leaves valgrind much happier than before. We also lose some dependency baggage.

Closes #45

Copy link
Contributor

@hanna-kruppe hanna-kruppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started looking into what it would take to replace the threadpool dependency, for overlapping reasons, and then I saw this PR. So, first of all, thank you for writing the patch! I have zero authority to accept or reject it, but I'd also like to see this change done. However, I think the current version has a bug, it doesn't seem to actually (meaningfully) use multiple threads. And while I was at it, I made another suggestion (but again, my opinion carries no weight here).

src/pool.rs Outdated

thread::scope(|scope| {
let work = || {
let mut guard = iter.lock().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong: one worker thread will acquire the lock here and run all the tasks sequentially, while the other worker threads will be blocked until the end and then see an empty iterator. At least I'm 99% sure that's what will happen -- if there is a subtle reason why it actually works, I'd advocate for making the code less subtle!

The mechanical alternative is to only acquire the lock for the duration of each next() call. This requires desugaring the for-loop a bit. Perhaps extract the "get next task" bit into a dedicated closure and write the workers as while let Some(task) = next_task() { task(); }?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Updated to what you propose here.

src/pool.rs Outdated
pub(crate) type Task = dyn FnOnce() + Send;
pub(crate) type BoxedTask = Box<Task>;

pub(crate) struct ScopedPool<I>(I);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this type is only ever used as ScopedPool(vec_of_tasks).run(num_threads) I would suggest deleting this type and the <I> type parameters. A single function without generics would be sufficient and more straightforward: fn run_all(tasks: Vec<BoxedTask>, num_threads: Option<usize>).

In general this is a stylistic question that people can and will disagree with, but in this specific case I think the extra abstraction obscures what's actually going on. For example, it's useful to know that advancing the iterator doesn't do any significant work on its own (otherwise holding a lock while advancing it may be problematic) and that's not true for arbitrary IntoIterators, but it's true for Vec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, there is no real need for a structure so I changed to a fn. However I left the IntoIterator for another review round, as I do see a valid reason for it to be generic. See the newly-added tests where we create dummy-tasks on-demand rather than pre-creating them to a Vec. True, we expect the operation to be cheap enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it really makes a difference, I could add a pub(crate) fn version which accepts Vec and delegates to the generic version which becomes private.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sold on those tests (and thus, on keeping a more complicated interface for their sake). Recording thread IDs of everyone who tries to get a task from the iterator is a clever hack, but it is a hack, not a test of behavior that actually matters. The previous buggy version of this patch would have passed those tests, since it did call next() on the tasks iterator from every thread at least once! A hypothetical buggy version that pulled tasks from the iterator but didn't run them would also pass the test. Conversely, the tests nail down current implementation details and would unnecessarily fail if they changed (e.g., if the main thread consumed the iterator itself and pushed all tasks to a multi-consumer channel that the workers pull from). It's just really hard to have good unit tests for thread pools, especially for non-functional requirements such as "actually utilize all the threads".

@Felix-El
Copy link
Contributor Author

@hanna-kruppe, many thanks you for your review. Appreciate you took the time to look in-depth.
I've refactored my approach a couple of times before my initial commit. Actually I've been there before, at roughly what you proposed above. Anyway, I hope the update resonates better with you and an authority.

This commit replaces the `threadpool` crate with a handcrafted
solution based on scoped threads. This leaves `valgrind` much
happier than before. We also lose some dependency baggage.
Copy link
Owner

@LukasKalbertodt LukasKalbertodt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR! Looking at the (lack of) activity of the threadpool crate, it does seem like a good idea to replace the dependency. Especially since doing it ourselves is really not hard.
Thank you @hanna-kruppe for already reviewing this before. Out of curiosity: what made you look into removing the dependency?

I have a few problems with the proposed changes, however. I left a few inline comments with my reasoning. The main functional problem is that the outcomes are only printed after all tests have passed.

However, to avoid another contributer-maintainer roundtrip (which, due to mainly me, tends to take a while), I just applied all changes already and rebased your PR as well. That way I can release this change soon.

for test in tests {
let mut tasks: Vec<pool::BoxedTask> = Default::default();

for test in tests.into_iter() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for test in tests.into_iter() {
for test in tests {

Should work?

Comment on lines +509 to -505
let num_tests = tests.len();
let (sender, receiver) = mpsc::channel();

let num_tests = tests.len();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the let num_tests line to its previous position (right above the loop) to reduce the diff size?

let (sender, receiver) = mpsc::channel();

let num_tests = tests.len();
for test in tests {
let mut tasks: Vec<pool::BoxedTask> = Default::default();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this work?

Suggested change
let mut tasks: Vec<pool::BoxedTask> = Default::default();
let mut tasks = Vec::new();

Comment on lines +3 to +4
pub(crate) type Task = dyn FnOnce() + Send;
pub(crate) type BoxedTask = Box<Task>;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these type aliases pull their weight. With my suggestion from a previous comment, they are only used in one spot. So please remove the aliases and just write out the type in the scoped_run_tasks function.

Comment on lines +530 to +531
pool::scoped_run_tasks(tasks, num_threads);

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is pretty short and only used in a single place (here), so why not inline it? Similar to what @hanna-kruppe said above, I think in this case we lose more by obscuring what actually happens compared to what we gain by black box abstraction.

Further, using thread::scope is actually problematic when looking at the next loop: that expects to run while the worker threads are doing something, in order to print something immediately after a test is finished. thread::scope waits for all threads to be done before returning. So in your code, all finished tests would fill up the channel, and only after all tests are done does the following loop run and print all tests.
This can be fixed by moving the printing loop below into the thread::scope closure, which also requires inlining the function.

Aaand in that case we can also get rid of the first loop, since we already have a vector of "tasks" (the trials) we can iterate over.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way you inlined it in d194ff9 has a bug: it spawns num_threads scoped threads, each of which only takes one trial from iter and then terminates. So if you have fewer threads than trials, not all tests are executed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooof, that's awkward. Thanks for catching it! Should be fixed in d6ac84f. Would you mind reviewing that, to avoid me publishing another avoidable bug?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment on that commit.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing. I removed the outdated comment in e17efb9 and also added additional tests in 41a1856 to catch this kind of bug in the future. Thanks again for catching this, will release momentarily.

@LukasKalbertodt
Copy link
Owner

Due to me not being able to push to your branch, I had to merge manually, which (paired with a rebase), github does not recognize. But here, your changes are merged: d194ff9

@hanna-kruppe
Copy link
Contributor

Thank you @hanna-kruppe for already reviewing this before. Out of curiosity: what made you look into removing the dependency?

Every once in a while I like the exercise of reading through the lockfile. I was surprised to see num_cpus in one of my lockfiles, since it's a relic of the bad old days before std::thread::available_parallelism. After tracing that back to threadpool and looking at its lack of activity, open issues, and skimming the implementation, I thought there's gotta be a simpler way to run some tests in parallel.

@LukasKalbertodt
Copy link
Owner

@hanna-kruppe Ah, that makes. I also do regularly do that (mostly via cargo tree). I'm simply not personally using libtest-mimic, so I tend to not notice this stuff.

And by the way (for the two of you): I just released v0.8.0 with these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leaks when running multiple threads
3 participants