Feat/ostree walker #1020

pattivacek · 2018-11-30T13:27:34Z

I wrote most of this while trying to debug the missing OSTree object errors, and I figured it was worth polishing up a bit and submitting. Note that I've never successfully gotten a walk to finish, so I don't know what happens if it does. I suspect it will block indefinitely, but I'm not sure. Hopefully sooner or later I'll be able to figure that out.

codecov-io · 2018-12-03T08:31:09Z

Codecov Report

Merging #1020 into master will decrease coverage by 0.14%.
The diff coverage is 77.82%.

@@            Coverage Diff             @@
##           master    #1020      +/-   ##
==========================================
- Coverage   82.48%   82.33%   -0.15%     
==========================================
  Files         195      196       +1     
  Lines       13793    13858      +65     
==========================================
+ Hits        11377    11410      +33     
- Misses       2416     2448      +32

Impacted Files	Coverage Δ
src/sota_tools/ostree_object.h	`100% <ø> (ø)`	⬆️
src/libaktualizr/http/httpinterface.h	`100% <ø> (ø)`	⬆️
src/sota_tools/treehub_server_test.cc	`95.23% <ø> (ø)`	⬆️
src/sota_tools/ostree_repo.h	`66.66% <ø> (+16.66%)`	⬆️
src/libaktualizr/http/httpclient.cc	`91.13% <ø> (ø)`	⬆️
src/sota_tools/oauth2.cc	`86.66% <ø> (ø)`	⬆️
src/libaktualizr/http/httpclient.h	`100% <ø> (ø)`	⬆️
src/sota_tools/ostree_object_test.cc	`100% <100%> (ø)`	⬆️
src/sota_tools/request_pool.h	`100% <100%> (ø)`	⬆️
src/sota_tools/ostree_dir_repo.h	`100% <100%> (ø)`	⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b9825f0...cd3acde. Read the comment docs.

pattivacek · 2018-12-07T11:11:22Z

By the way, I've solved the halting problem and optimized several parts of the code. I'm looking into more optimizations for garage-deploy, but this work is ready for review. Note that I still haven't gotten to use the garage-push --walk-tree option in full glory yet.

lbonn · 2018-12-07T11:14:30Z

So, did you manage to run a full garage-check --walk-tree?

pattivacek · 2018-12-07T11:17:50Z

So, did you manage to run a full garage-check --walk-tree?

Yes, finally! I've also run a full garage-push --walk-tree, but I still haven't seen a case where it successfully found an uploaded an unexpectedly missing object.

pattivacek · 2018-12-12T16:02:32Z

Added some more work to improve garage-deploy speed.

For a fairly simple change, garage-push (with these changes) took this long (note that this does not include garage-sign, but that is usually trivial), measured with time:

real  0m9,098s
user  0m0,947s
sys   0m0,317s

Using garage-deploy from master to upload the same objects:

real  3m13,168s
user  0m27,168s
sys   0m1,309s

Using garage-deploy from this branch to upload the same objects:

real  1m37,113s
user  0m24,744s
sys   0m1,211s

I haven't been able to compare a larger set of changes/objects because of unreliable connectivity. (I actually had one error with the garage-deploy in this case, but it recovered and was trivial enough that it still outperformed the old garage-deploy by quite a bit.)

I'm also still quite impressed with how fast garage-push is compared with garage-deploy. The advantage of reading from disk instead of over a network is obvious, but I'm still wondering if there are further inefficiencies I haven't yet uncovered.

pattivacek · 2018-12-14T16:30:12Z

I think this should be reviewed and merged now. I've tested that garage-push will successfully push children objects even if the parents exist when running with -w and I've proven that garage-deploy is now substantially faster than before.

I've had to re-read this code probably a dozen times, so maybe this will save me time next time around. Signed-off-by: Patrick Vacek <[email protected]>

Mostly just useful for debugging strange circumstances. WIP. Signed-off-by: Patrick Vacek <[email protected]>

This is shared across the garage tools. This enabled me to fix the remaining TODO in OSTreeObject::CurlDone(). Signed-off-by: Patrick Vacek <[email protected]>

Signed-off-by: Patrick Vacek <[email protected]>

Basically just break if the system is idle, which should only happen once everything is done. Signed-off-by: Patrick Vacek <[email protected]>

Still check the children if we are walking the whole tree, though. Signed-off-by: Patrick Vacek <[email protected]>

@OYTIS

Thanks to @eu-smirnov and @OYTIS for helping disect this and find the room for improvement. Signed-off-by: Patrick Vacek <[email protected]>

Do not wait at all if the timeout return value is 0. Signed-off-by: Patrick Vacek <[email protected]>

* Only print concurrency messages if it changed. * Log fetches at debug level. Trace level has the full log from curl, but that is usually too much, and we printed nothing at debug level. Signed-off-by: Patrick Vacek <[email protected]>

It isn't particularly useful for garage-deploy, since the fetches are on a single thread, but it can't hurt. Signed-off-by: Patrick Vacek <[email protected]>

We know the type when we inspect objects for their children, and if we store it, when can reuse it when querying the server. Previously, we were trying all known object extensions, which resulted in a lot of unnecessary networking calls. Signed-off-by: Patrick Vacek <[email protected]>

Signed-off-by: Patrick Vacek <[email protected]>

And log at debug level if we do. Signed-off-by: Patrick Vacek <[email protected]>

lbonn · 2018-12-17T09:00:15Z

src/sota_tools/deploy.cc

+bool CheckPoolState(const OSTreeObject::ptr &root_object, const RequestPool &request_pool) {
+  if (request_pool.run_mode() == RunMode::kWalkTree || request_pool.run_mode() == RunMode::kPushTree) {
+    return !request_pool.is_idle() && !request_pool.is_stopped();
+  } else {


Clarity: having an exhaustive switch/case on all existing running modes would make this part easier to follow as it wouldn't need to know about the exact enum cases.

lbonn · 2018-12-17T09:15:48Z

src/sota_tools/ostree_object_test.cc

-/* Verify that constructor does not accept a nonexistent repo. */
+/* Verify that constructor does not accept a nonexistent repo.
+ *
+ * Note that this will not fail on release builds! */


Good catch, but isn't that a bit bad, for example if we want to run our test suite on qemu? This could be a FIXME/TODO.

Yeah, I'm actually working on fixing that right now! My current solution is to wrap it around NDEBUG so that it doesn't run on Release builds.

Ok but what does it test then? It's not the actual behaviour of the class.

Currently it fails in debug builds because of an assert that checks that the repo is valid. Would you prefer that instead threw an exception so it failed in all cases?

Either that or just remove this test. If we just test that the constructor has an assert, I don't think it brings much value.

lbonn · 2018-12-17T09:57:24Z

src/sota_tools/ostree_repo.cc

+    return it->second;
+  }
+
+  const std::string exts[] = {".filez", ".dirtree", ".dirmeta", ".commit"};


std::map exts{{OstreeObjectType::OSTREE_OBJECT_TYPE_FILE, "filez"}, ...}?

edit: oh, this is old code which was moved, not urgent...
edit2: oh actually not, this is new code sorry :). I was looking at the commit's diff

Just reducing unnecessary variation and cleaning up minor issues. Signed-off-by: Patrick Vacek <[email protected]>

This also has the benefit of causing the relevant test to fail also in release builds. (It used to just be an assert.) Signed-off-by: Patrick Vacek <[email protected]>

Signed-off-by: Patrick Vacek <[email protected]>

lbonn · 2018-12-17T10:11:50Z

src/sota_tools/request_pool.cc

-    timeout.tv_sec = 0;
-    timeout.tv_usec = 100 * 1000;
+    // If maxfd == -1, then wait the lesser of timeoutms and 100ms. See:
+    // https://curl.haxx.se/libcurl/c/curl_multi_fdset.html


No if?

long nofd_timeoutms = std::min(timeoutms, 100); LOG_DEBUG << "Waiting " << nofd_timeoutms << " ms for curl"; timeout.tv_sec = nofd_timeoutms / 1000; timeout.tv_usec = 1000 * (nofd_timeoutms % 1000);

Oh and actually, tv_sec is then always 0...

lbonn · 2018-12-17T10:21:12Z

src/sota_tools/ostree_object.cc

@@ -241,7 +247,24 @@ void OSTreeObject::Upload(const TreehubServer &push_target, CURLM *curl_multi_ha
  request_start_time_ = std::chrono::steady_clock::now();
 }

-void OSTreeObject::PresenceUnknown(RequestPool &pool, const int64_t rescode) {
+void OSTreeObject::CheckChildren(RequestPool &pool, const long rescode) {  // NOLINT


This clang-tidy lint is a bit annoying https://clang.llvm.org/extra/clang-tidy/checks/google-runtime-int.html. Even if I agree with the intent, it is a bit strict.

Maybe we could disable it. Otherwise, maybe use the specific lint id: // NOLINT(google-runtime-int)

Generally, I like it, it's just because of the curl API that we have the problem. You are right about specifying the lint ID, though.

lbonn

Ok, I've made some style comments. I'll maybe need a bit more time to follow all the logic but since you've run tests on real repos and that it brings good speed improvements, it's already enough from my perspective.

Also refactor CheckPoolState to use a cleaner switch. Signed-off-by: Patrick Vacek <[email protected]>

pattivacek · 2018-12-17T11:26:15Z

I'll maybe need a bit more time to follow all the logic

I don't blame you, this ended up being a lot of work all thrown in one place. At least I did my best to separate the work into meaningful chunks/commits. The only real new feature is the tree walking, but there is a ton of refactoring, bug fixing, and optimization alongside that.

Signed-off-by: Patrick Vacek <[email protected]>

pattivacek force-pushed the feat/ostree-walker branch 3 times, most recently from c8f7764 to 43e91da Compare December 3, 2018 08:31

pattivacek requested review from patriotyk, OYTIS, lbonn and eu-siemann December 3, 2018 12:52

pattivacek force-pushed the feat/ostree-walker branch 4 times, most recently from 16edef4 to 36cf5ce Compare December 7, 2018 10:09

pattivacek force-pushed the feat/ostree-walker branch 5 times, most recently from 5e5be04 to 9ca9ab4 Compare December 12, 2018 15:52

pattivacek force-pushed the feat/ostree-walker branch from 9ca9ab4 to 03ac30f Compare December 14, 2018 14:22

pattivacek added 7 commits December 17, 2018 09:54

Document OstreeObject class better.

44bbd0a

I've had to re-read this code probably a dozen times, so maybe this will save me time next time around. Signed-off-by: Patrick Vacek <[email protected]>

Add --walk-tree option to garage-check to check everything.

1d7b3be

Mostly just useful for debugging strange circumstances. WIP. Signed-off-by: Patrick Vacek <[email protected]>

Refactor dry_run and walk_tree into RunMode.

ad49b71

This is shared across the garage tools. This enabled me to fix the remaining TODO in OSTreeObject::CurlDone(). Signed-off-by: Patrick Vacek <[email protected]>

Add flag to garage-push to push any missing commits in the tree.

f4bf692

Signed-off-by: Patrick Vacek <[email protected]>

Fix tree walking halt condition.

7084f35

Basically just break if the system is idle, which should only happen once everything is done. Signed-off-by: Patrick Vacek <[email protected]>

Do not upload an object if it is known to be present on the server.

7cb3b99

Still check the children if we are walking the whole tree, though. Signed-off-by: Patrick Vacek <[email protected]>

Fix request pool wait logic.

13ce4d5

Thanks to @eu-smirnov and @OYTIS for helping disect this and find the room for improvement. Signed-off-by: Patrick Vacek <[email protected]>

pattivacek added 6 commits December 17, 2018 09:54

Further fixes to request_pool wait logic.

9c6aedf

Do not wait at all if the timeout return value is 0. Signed-off-by: Patrick Vacek <[email protected]>

garage-tools: Better debug-level logging.

7609cac

* Only print concurrency messages if it changed. * Log fetches at debug level. Trace level has the full log from curl, but that is usually too much, and we printed nothing at debug level. Signed-off-by: Patrick Vacek <[email protected]>

Allow concurrent network operations for garage-check and garage-deploy.

a42eb60

It isn't particularly useful for garage-deploy, since the fetches are on a single thread, but it can't hurt. Signed-off-by: Patrick Vacek <[email protected]>

Refactor OSTreeRepo to take advantage of object types.

3853ef7

Signed-off-by: Patrick Vacek <[email protected]>

Don't call sleep_for if we don't want to sleep.

a377c69

And log at debug level if we do. Signed-off-by: Patrick Vacek <[email protected]>

pattivacek force-pushed the feat/ostree-walker branch from 03ac30f to 3be8104 Compare December 17, 2018 08:54

lbonn reviewed Dec 17, 2018

View reviewed changes

pattivacek force-pushed the feat/ostree-walker branch from 3be8104 to 5694746 Compare December 17, 2018 09:51

lbonn reviewed Dec 17, 2018

View reviewed changes

pattivacek added 3 commits December 17, 2018 11:08

Refactor garage tools to align them closer.

e578465

Just reducing unnecessary variation and cleaning up minor issues. Signed-off-by: Patrick Vacek <[email protected]>

Throw error in OstreeObject if input repo is invalid.

6568a2d

This also has the benefit of causing the relevant test to fail also in release builds. (It used to just be an assert.) Signed-off-by: Patrick Vacek <[email protected]>

Add assert to try to catch a logic error.

5838374

Signed-off-by: Patrick Vacek <[email protected]>

pattivacek force-pushed the feat/ostree-walker branch from 5694746 to 6d4e095 Compare December 17, 2018 10:10

lbonn reviewed Dec 17, 2018

View reviewed changes

lbonn approved these changes Dec 17, 2018

View reviewed changes

Require commit to be present on server for pushing the entire tree.

bb3ac91

Also refactor CheckPoolState to use a cleaner switch. Signed-off-by: Patrick Vacek <[email protected]>

pattivacek force-pushed the feat/ostree-walker branch 2 times, most recently from 1c60c85 to a9a59a2 Compare December 17, 2018 11:18

pattivacek force-pushed the feat/ostree-walker branch from a9a59a2 to 842c44c Compare December 17, 2018 12:33

pattivacek added 3 commits December 17, 2018 13:59

Use std::min and save some redundancy.

db37b21

Signed-off-by: Patrick Vacek <[email protected]>

Refactor object type discovery to use a map.

055d4e3

Signed-off-by: Patrick Vacek <[email protected]>

Specify lint ID of the long issue that was often ignore because of curl.

cd3acde

Signed-off-by: Patrick Vacek <[email protected]>

pattivacek force-pushed the feat/ostree-walker branch from 842c44c to cd3acde Compare December 17, 2018 13:00

pattivacek merged commit 24a1ce0 into master Dec 17, 2018

pattivacek deleted the feat/ostree-walker branch December 17, 2018 14:42

pattivacek mentioned this pull request Dec 17, 2018

garage-deploy is slow #1015

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/ostree walker #1020

Feat/ostree walker #1020

pattivacek commented Nov 30, 2018

codecov-io commented Dec 3, 2018 •

edited

Loading

pattivacek commented Dec 7, 2018

lbonn commented Dec 7, 2018

pattivacek commented Dec 7, 2018

pattivacek commented Dec 12, 2018 •

edited

Loading

pattivacek commented Dec 14, 2018

lbonn Dec 17, 2018

lbonn Dec 17, 2018

pattivacek Dec 17, 2018

lbonn Dec 17, 2018

pattivacek Dec 17, 2018

lbonn Dec 17, 2018

lbonn Dec 17, 2018 •

edited

Loading

lbonn Dec 17, 2018

lbonn Dec 17, 2018

lbonn Dec 17, 2018

pattivacek Dec 17, 2018

lbonn left a comment

pattivacek commented Dec 17, 2018

Feat/ostree walker #1020

Feat/ostree walker #1020

Conversation

pattivacek commented Nov 30, 2018

codecov-io commented Dec 3, 2018 • edited Loading

Codecov Report

pattivacek commented Dec 7, 2018

lbonn commented Dec 7, 2018

pattivacek commented Dec 7, 2018

pattivacek commented Dec 12, 2018 • edited Loading

pattivacek commented Dec 14, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lbonn Dec 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lbonn left a comment

Choose a reason for hiding this comment

pattivacek commented Dec 17, 2018

codecov-io commented Dec 3, 2018 •

edited

Loading

pattivacek commented Dec 12, 2018 •

edited

Loading

lbonn Dec 17, 2018 •

edited

Loading