Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel relabel #1137

Closed
wants to merge 4 commits into from
Closed

Parallel relabel #1137

wants to merge 4 commits into from

Conversation

cgwalters
Copy link
Member

On top of #1124

@@ -2586,6 +2447,7 @@ relabel_in_thread_impl (RpmOstreeContext *self,
NULL, NULL, NULL);

ostree_repo_commit_modifier_set_devino_cache (modifier, cache);
ostree_repo_commit_modifier_set_sepolicy (modifier, self->sepolicy);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer your question, I can't think of a good reason right now why it wasn't done this way. Definitely ostreedev/ostree#1165 would've been a blocker, though clearly it could've been fixed back then too. Anyway, this is definitely better!

@jlebon
Copy link
Member

jlebon commented Dec 7, 2017

This was almost straightforward except I spent a lot of time
debugging what turned out to be calling dnf_package_get_nevra()
in the worker threads 😢.

Ouch, I know that pain! :(

[1] ... Write it in Python, where MT isn't useful at all?

For Python at least, it's really easy to do multiprocessing using the module. I make use of it in the PAPR Python rewrite (which I really should pick up again). But yeah, clearly it sucks if you need a lot of coordination.

@rh-atomic-bot
Copy link

☔ The latest upstream changes (presumably 51c5591) made this pull request unmergeable. Please resolve the merge conflicts.

@cgwalters cgwalters mentioned this pull request Dec 8, 2017
@cgwalters cgwalters changed the title WIP: Parallel relabel Parallel relabel Dec 8, 2017
@cgwalters
Copy link
Member Author

Rebased 🏄‍♂️ and lifting WIP.

GCancellable *cancellable,
GError **error)
{
if (g_cancellable_set_error_if_cancelled (cancellable, error))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we also need this in import_in_thread ? Otherwise, we won't even stop new imports, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check the cancellable each iteration of reading the archive, and GTask also checks the cancellable before actually running the thread; see g_task_start_task_thread(), so it's probably simplest to just drop the one in importing.

Copy link
Member Author

@cgwalters cgwalters Dec 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm no the GTask one is just if one uses g_task_set_return_on_cancel(), let's be conservative.

@@ -2093,6 +2095,7 @@ rpmostree_context_import_jigdo (RpmOstreeContext *self,

self->async_dnfstate = hifstate;
self->async_running = TRUE;
self->async_cancellable = g_cancellable_new ();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm still not clear on how this works exactly. We create a new GCancellable here, but when do we actually consult it? We're still passing our own cancellable to rpmostree_importer_run_async so the g_cancellable_set_error_if_cancelled importer fixup you added is not actually using this, right? So, aren't we still going to end up trying to import all the pkgs even if the very first one failed? Or should we be passing self->async_cancellable to rpmostree_importer_run_async down below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No you're totally right, it doesn't work 😉. I was throwing code at the wall here a bit. Let's do the obvious thing here and reuse the source cancellable, which is what ostree-pull does too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a fixup ⬇️ which I tested by adding:

if (g_random_boolean ())
  return glnx_throw (...)

in the relabel_in_thread() code.

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it looks like we're not preserving the original error correctly:

# + ssh -o User=root -o ControlMaster=auto -o ControlPath=/var/tmp/ssh-vmcheck2-1513030065068962438.sock -o ControlPersist=yes vmcheck2 env ASAN_OPTIONS=detect_leaks=false rpm-ostree install test-opt-1.0
# error: Operation was cancelled
+ fatal 'File '\''err.txt'\'' doesn'\''t match regexp '\''See https://github.com/projectatomic/rpm-ostree/issues/233'\'''
+ echo File ''\''err.txt'\''' 'doesn'\''t' match regexp ''\''See' 'https://github.com/projectatomic/rpm-ostree/issues/233'\'''
File 'err.txt' doesn't match regexp 'See https://github.com/projectatomic/rpm-ostree/issues/233'

@@ -2095,7 +2096,7 @@ rpmostree_context_import_jigdo (RpmOstreeContext *self,

self->async_dnfstate = hifstate;
self->async_running = TRUE;
self->async_cancellable = g_cancellable_new ();
self->async_cancellable = cancellable;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh OK, that makes more sense now! :)

To exit earlier if we've been cancelled. Came up in review for parallel
relabeling.
I believe this is a leftover vestige, and it was adding confusion when I was
debugging `rpmostree-core.c` async ops and cancellation.

Now the only cancellables in the daemon are created by transaction ops.
Basically since we're doing internal async ops which set the cancellable on
failure, we still want the first error to win since it'll be more useful. See
the docs for `g_task_set_check_cancellable()` for more.
This is another big task just like importing that greatly benefits
from being parallel.  While here I hit the issue that on error
we didn't wait for pending async tasks to complete; I changed things
for importing so that we do that, and used it here too.

This was almost straightforward except I spent a *lot* of time
debugging what turned out to be calling `dnf_package_get_nevra()`
in the worker threads 😢.

I'm mostly writing this to speed up unified core/jigdo, but it's also obviously
relevant on the client side.
@cgwalters
Copy link
Member Author

Hmm, it looks like we're not preserving the original error correctly:

Yep, added a prep commit to fix that ⬆️

@jlebon
Copy link
Member

jlebon commented Dec 12, 2017

@rh-atomic-bot r+ f7e9992

@rh-atomic-bot
Copy link

⚡ Test exempted: merge already tested.

rh-atomic-bot pushed a commit that referenced this pull request Dec 12, 2017
I believe this is a leftover vestige, and it was adding confusion when I was
debugging `rpmostree-core.c` async ops and cancellation.

Now the only cancellables in the daemon are created by transaction ops.

Closes: #1137
Approved by: jlebon
rh-atomic-bot pushed a commit that referenced this pull request Dec 12, 2017
Basically since we're doing internal async ops which set the cancellable on
failure, we still want the first error to win since it'll be more useful. See
the docs for `g_task_set_check_cancellable()` for more.

Closes: #1137
Approved by: jlebon
rh-atomic-bot pushed a commit that referenced this pull request Dec 12, 2017
This is another big task just like importing that greatly benefits
from being parallel.  While here I hit the issue that on error
we didn't wait for pending async tasks to complete; I changed things
for importing so that we do that, and used it here too.

This was almost straightforward except I spent a *lot* of time
debugging what turned out to be calling `dnf_package_get_nevra()`
in the worker threads 😢.

I'm mostly writing this to speed up unified core/jigdo, but it's also obviously
relevant on the client side.

Closes: #1137
Approved by: jlebon
cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Jan 2, 2018
Basically the `rpmostree_context_relabel()` call we had in the treecompose path
for unified core didn't actually have any effect as the core code did a relabel
and unset the array.

I think this may actually be a regression from: coreos#1137
though I didn't verify.

Anyways looking at this, the code is a lot simpler if we change the API so that
the "normal" relabeling is folded into `rpmostree_context_assemble()`. Then we
change the public relabel API to be "force relabel" which we use in the unified
core 🌐 treecompose path.

This shrinks the jigdoRPM for FAH from 90MB to 68MB.

Closes: coreos#1172
cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Jan 3, 2018
Basically the `rpmostree_context_relabel()` call we had in the treecompose path
for unified core didn't actually have any effect as the core code did a relabel
and unset the array.

I think this may actually be a regression from: coreos#1137
though I didn't verify.

Anyways looking at this, the code is a lot simpler if we change the API so that
the "normal" relabeling is folded into `rpmostree_context_assemble()`. Then we
change the public relabel API to be "force relabel" which we use in the unified
core 🌐 treecompose path.

This shrinks the jigdoRPM for FAH from 90MB to 68MB.

Closes: coreos#1172
cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Jan 6, 2018
Basically the `rpmostree_context_relabel()` call we had in the treecompose path
for unified core didn't actually have any effect as the core code did a relabel
and unset the array.

I think this may actually be a regression from: coreos#1137
though I didn't verify.

Anyways looking at this, the code is a lot simpler if we change the API so that
the "normal" relabeling is folded into `rpmostree_context_assemble()`. Then we
change the public relabel API to be "force relabel" which we use in the unified
core 🌐 treecompose path.

This shrinks the jigdoRPM for FAH from 90MB to 68MB.

Closes: coreos#1172
rh-atomic-bot pushed a commit that referenced this pull request Jan 8, 2018
Basically the `rpmostree_context_relabel()` call we had in the treecompose path
for unified core didn't actually have any effect as the core code did a relabel
and unset the array.

I think this may actually be a regression from: #1137
though I didn't verify.

Anyways looking at this, the code is a lot simpler if we change the API so that
the "normal" relabeling is folded into `rpmostree_context_assemble()`. Then we
change the public relabel API to be "force relabel" which we use in the unified
core 🌐 treecompose path.

This shrinks the jigdoRPM for FAH from 90MB to 68MB.

Closes: #1172

Closes: #1173
Approved by: jlebon
rh-atomic-bot pushed a commit that referenced this pull request Jan 8, 2018
Basically the `rpmostree_context_relabel()` call we had in the treecompose path
for unified core didn't actually have any effect as the core code did a relabel
and unset the array.

I think this may actually be a regression from: #1137
though I didn't verify.

Anyways looking at this, the code is a lot simpler if we change the API so that
the "normal" relabeling is folded into `rpmostree_context_assemble()`. Then we
change the public relabel API to be "force relabel" which we use in the unified
core 🌐 treecompose path.

This shrinks the jigdoRPM for FAH from 90MB to 68MB.

Closes: #1172

Closes: #1173
Approved by: jlebon
rh-atomic-bot pushed a commit that referenced this pull request Jan 9, 2018
Basically the `rpmostree_context_relabel()` call we had in the treecompose path
for unified core didn't actually have any effect as the core code did a relabel
and unset the array.

I think this may actually be a regression from: #1137
though I didn't verify.

Anyways looking at this, the code is a lot simpler if we change the API so that
the "normal" relabeling is folded into `rpmostree_context_assemble()`. Then we
change the public relabel API to be "force relabel" which we use in the unified
core 🌐 treecompose path.

This shrinks the jigdoRPM for FAH from 90MB to 68MB.

Closes: #1172

Closes: #1173
Approved by: jlebon
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants