-
Notifications
You must be signed in to change notification settings - Fork 61
Conversation
c8f7764
to
43e91da
Compare
Codecov Report
@@ Coverage Diff @@
## master #1020 +/- ##
==========================================
- Coverage 82.48% 82.33% -0.15%
==========================================
Files 195 196 +1
Lines 13793 13858 +65
==========================================
+ Hits 11377 11410 +33
- Misses 2416 2448 +32
Continue to review full report at Codecov.
|
16edef4
to
36cf5ce
Compare
By the way, I've solved the halting problem and optimized several parts of the code. I'm looking into more optimizations for garage-deploy, but this work is ready for review. Note that I still haven't gotten to use the garage-push |
So, did you manage to run a full |
Yes, finally! I've also run a full |
5e5be04
to
9ca9ab4
Compare
Added some more work to improve garage-deploy speed. For a fairly simple change, garage-push (with these changes) took this long (note that this does not include garage-sign, but that is usually trivial), measured with
Using garage-deploy from master to upload the same objects:
Using garage-deploy from this branch to upload the same objects:
I haven't been able to compare a larger set of changes/objects because of unreliable connectivity. (I actually had one error with the garage-deploy in this case, but it recovered and was trivial enough that it still outperformed the old garage-deploy by quite a bit.) I'm also still quite impressed with how fast garage-push is compared with garage-deploy. The advantage of reading from disk instead of over a network is obvious, but I'm still wondering if there are further inefficiencies I haven't yet uncovered. |
9ca9ab4
to
03ac30f
Compare
I think this should be reviewed and merged now. I've tested that garage-push will successfully push children objects even if the parents exist when running with |
I've had to re-read this code probably a dozen times, so maybe this will save me time next time around. Signed-off-by: Patrick Vacek <[email protected]>
Mostly just useful for debugging strange circumstances. WIP. Signed-off-by: Patrick Vacek <[email protected]>
This is shared across the garage tools. This enabled me to fix the remaining TODO in OSTreeObject::CurlDone(). Signed-off-by: Patrick Vacek <[email protected]>
Signed-off-by: Patrick Vacek <[email protected]>
Basically just break if the system is idle, which should only happen once everything is done. Signed-off-by: Patrick Vacek <[email protected]>
Still check the children if we are walking the whole tree, though. Signed-off-by: Patrick Vacek <[email protected]>
Thanks to @eu-smirnov and @OYTIS for helping disect this and find the room for improvement. Signed-off-by: Patrick Vacek <[email protected]>
Do not wait at all if the timeout return value is 0. Signed-off-by: Patrick Vacek <[email protected]>
* Only print concurrency messages if it changed. * Log fetches at debug level. Trace level has the full log from curl, but that is usually too much, and we printed nothing at debug level. Signed-off-by: Patrick Vacek <[email protected]>
It isn't particularly useful for garage-deploy, since the fetches are on a single thread, but it can't hurt. Signed-off-by: Patrick Vacek <[email protected]>
We know the type when we inspect objects for their children, and if we store it, when can reuse it when querying the server. Previously, we were trying all known object extensions, which resulted in a lot of unnecessary networking calls. Signed-off-by: Patrick Vacek <[email protected]>
Signed-off-by: Patrick Vacek <[email protected]>
And log at debug level if we do. Signed-off-by: Patrick Vacek <[email protected]>
03ac30f
to
3be8104
Compare
src/sota_tools/deploy.cc
Outdated
bool CheckPoolState(const OSTreeObject::ptr &root_object, const RequestPool &request_pool) { | ||
if (request_pool.run_mode() == RunMode::kWalkTree || request_pool.run_mode() == RunMode::kPushTree) { | ||
return !request_pool.is_idle() && !request_pool.is_stopped(); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarity: having an exhaustive switch/case on all existing running modes would make this part easier to follow as it wouldn't need to know about the exact enum cases.
src/sota_tools/ostree_object_test.cc
Outdated
/* Verify that constructor does not accept a nonexistent repo. */ | ||
/* Verify that constructor does not accept a nonexistent repo. | ||
* | ||
* Note that this will not fail on release builds! */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, but isn't that a bit bad, for example if we want to run our test suite on qemu? This could be a FIXME/TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm actually working on fixing that right now! My current solution is to wrap it around NDEBUG
so that it doesn't run on Release builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok but what does it test then? It's not the actual behaviour of the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it fails in debug builds because of an assert that checks that the repo is valid. Would you prefer that instead threw an exception so it failed in all cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either that or just remove this test. If we just test that the constructor has an assert, I don't think it brings much value.
3be8104
to
5694746
Compare
src/sota_tools/ostree_repo.cc
Outdated
return it->second; | ||
} | ||
|
||
const std::string exts[] = {".filez", ".dirtree", ".dirmeta", ".commit"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::map exts{{OstreeObjectType::OSTREE_OBJECT_TYPE_FILE, "filez"}, ...}
?
edit: oh, this is old code which was moved, not urgent...
edit2: oh actually not, this is new code sorry :). I was looking at the commit's diff
Just reducing unnecessary variation and cleaning up minor issues. Signed-off-by: Patrick Vacek <[email protected]>
This also has the benefit of causing the relevant test to fail also in release builds. (It used to just be an assert.) Signed-off-by: Patrick Vacek <[email protected]>
Signed-off-by: Patrick Vacek <[email protected]>
5694746
to
6d4e095
Compare
src/sota_tools/request_pool.cc
Outdated
timeout.tv_sec = 0; | ||
timeout.tv_usec = 100 * 1000; | ||
// If maxfd == -1, then wait the lesser of timeoutms and 100ms. See: | ||
// https://curl.haxx.se/libcurl/c/curl_multi_fdset.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No if?
long nofd_timeoutms = std::min(timeoutms, 100);
LOG_DEBUG << "Waiting " << nofd_timeoutms << " ms for curl";
timeout.tv_sec = nofd_timeoutms / 1000;
timeout.tv_usec = 1000 * (nofd_timeoutms % 1000);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and actually, tv_sec
is then always 0...
src/sota_tools/ostree_object.cc
Outdated
@@ -241,7 +247,24 @@ void OSTreeObject::Upload(const TreehubServer &push_target, CURLM *curl_multi_ha | |||
request_start_time_ = std::chrono::steady_clock::now(); | |||
} | |||
|
|||
void OSTreeObject::PresenceUnknown(RequestPool &pool, const int64_t rescode) { | |||
void OSTreeObject::CheckChildren(RequestPool &pool, const long rescode) { // NOLINT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This clang-tidy lint is a bit annoying https://clang.llvm.org/extra/clang-tidy/checks/google-runtime-int.html. Even if I agree with the intent, it is a bit strict.
Maybe we could disable it. Otherwise, maybe use the specific lint id: // NOLINT(google-runtime-int)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I like it, it's just because of the curl API that we have the problem. You are right about specifying the lint ID, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've made some style comments. I'll maybe need a bit more time to follow all the logic but since you've run tests on real repos and that it brings good speed improvements, it's already enough from my perspective.
Also refactor CheckPoolState to use a cleaner switch. Signed-off-by: Patrick Vacek <[email protected]>
1c60c85
to
a9a59a2
Compare
I don't blame you, this ended up being a lot of work all thrown in one place. At least I did my best to separate the work into meaningful chunks/commits. The only real new feature is the tree walking, but there is a ton of refactoring, bug fixing, and optimization alongside that. |
a9a59a2
to
842c44c
Compare
Signed-off-by: Patrick Vacek <[email protected]>
Signed-off-by: Patrick Vacek <[email protected]>
Signed-off-by: Patrick Vacek <[email protected]>
842c44c
to
cd3acde
Compare
I wrote most of this while trying to debug the missing OSTree object errors, and I figured it was worth polishing up a bit and submitting. Note that I've never successfully gotten a walk to finish, so I don't know what happens if it does. I suspect it will block indefinitely, but I'm not sure. Hopefully sooner or later I'll be able to figure that out.