Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RO sync #175

Merged
merged 11 commits into from
Jul 8, 2020
Merged

RO sync #175

merged 11 commits into from
Jul 8, 2020

Conversation

icristescu
Copy link
Contributor

An RO syncs when : (1) opening the store and (2) when calling ro_syncs.

There are some tests that fail due to clear, so I commented them out (to make them work, we should rebase this over #168 ). But if you're not calling clear, this should work.

@icristescu icristescu mentioned this pull request Jun 12, 2020
@samoht samoht mentioned this pull request Jun 16, 2020
@icristescu icristescu force-pushed the ro_sync branch 3 times, most recently from f55021e to 68e1027 Compare June 18, 2020 09:17
Copy link
Member

@craigfe craigfe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much to say on the correctness of this PR while the tests are commented out, but here's some initial thoughts regardless.

Will re-review once that situation has been worked out 🙂

@icristescu icristescu mentioned this pull request Jun 18, 2020
@icristescu icristescu force-pushed the ro_sync branch 2 times, most recently from 0ece0f8 to 7eb58aa Compare June 18, 2020 13:58
sync_log t;
find_log_index () )
else raise Not_found)
@~ find_log_index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case that was handled by this retry is now completely omitted.
That means that it is virtually possible to call ro_sync and have the same interleaving behavior with merge as before, causing the sync_log to not see the disk modifications (that's why we retried before).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and it's tricky, because no test fails (I run them 500 times).

I looked at what interleaving execution was causing this bug,
(#118 (comment)), and this particular bug can be fixed at the end of a merge, by first setting the generation number and then clearing the log. It can cause a ro_sync to refill a log, even if there are no new values, but it cannot lead to a value not found anymore. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline: we can test this properly with a couple of extra hooks, and we probably should.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the hooks and tested them in commit 8d0f15b.
There is one test that fails in that commit: the RO reads the generation number and the offset to decide if a change occurred. If between the two reads both the generation changed and the log was cleared then RO cannot detect a change (see test 8d0f15b#diff-81aec069a46f58216cc2e7046f8d183dR349).
The solution I propose in 0af8704 is to group generation and offset reads/writes into a single read/write.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Your proposed fix sounds good to me.

You may be interested in craigfe@6fe96bd (now on CraigFe/atomic-reads), which was an experiment as part of #177 (comment) to have batch header operations. At the time, it seemed not worth it, but if we're going to have correctness issues with reading headers individually, it may be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I knew you had this somewhere, but didn't know where :) I added some of your modifications in the commit, and added you as co-author.
I did not add the fan_size in the headers, as it is not necessary for the correctness of this PR.

@icristescu icristescu changed the base branch from master to ro-sync June 28, 2020 13:44
@icristescu icristescu force-pushed the ro_sync branch 2 times, most recently from 9c525e7 to a973d87 Compare June 28, 2020 14:01
@icristescu icristescu mentioned this pull request Jun 29, 2020
@icristescu icristescu force-pushed the ro_sync branch 2 times, most recently from bb9af8b to c843634 Compare July 1, 2020 16:19
@icristescu icristescu changed the base branch from ro-sync to master July 2, 2020 16:01
src/stats.ml Outdated
f ();
let span = Mtime_clock.count timer in
let time = Mtime.Span.to_us span in
stats.sync_times <- time :: stats.sync_times
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an unbounded list and should probably be documented somewhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced this with a timer for only the latest sync call (instead of storing all previous sync calls). Also, by default every call to sync is timed now, so I removed the sync_with_timer.

@samoht
Copy link
Member

samoht commented Jul 2, 2020

That looks good to me and the tests are all green, so feel free to merge.

src/index.ml Outdated
@@ -298,7 +375,7 @@ struct
append_key_value log.io e.key e.value)
io;
IO.sync log.io;
IO.clear io)
IO.clear ~generation:0L io)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you're correctly preserving the existing semantics, but why don't we increment the generation number here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are only increasing the generation on clear when calling clear in index (as opposed to whenever we call IO.clear).
I changed this line with the last commit, so that log_async uses the same generation number as index and log. But it doesn't change too much, we never read the generation from the log_async.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK 👍 Thanks for the explanation. This will make more sense in the code when we have better distinction between IO and header formats.

@craigfe
Copy link
Member

craigfe commented Jul 8, 2020

Will merge once Travis passes.

@craigfe
Copy link
Member

craigfe commented Jul 8, 2020

Thanks!

@craigfe craigfe merged commit 9717194 into mirage:master Jul 8, 2020
@icristescu icristescu deleted the ro_sync branch July 8, 2020 16:44
craigfe added a commit to craigfe/opam-repository that referenced this pull request Oct 21, 2020
CHANGES:

## Added

- Added `flush_callback` parameter to the creation of a store, to register
  a callback before a flush. This callback can be temporarily disabled by
  `~no_callback:()` to `flush`. (mirage/index#189, mirage/index#216)

- Added `Stats.merge_durations` to list the duration of the last 10 merges.
  (mirage/index#193)

- Added `is_merging` to detect if a merge is running. (mirage/index#192)

- New `IO.Header.{get,set}` functions to read and write the file headers
  atomically (mirage/index#175, mirage/index#204, @icristescu, @craigfe, @samoht)

- Added a `throttle` configuration option to select the strategy to use
  when the cache are full and an async merge is already in progress. The
  current behavior is the (default) [`Block_writes] strategy. The new
  [`Overcommit_memory] does not block but continue to fill the cache instead.
  (mirage/index#209, @samoht)

- Add `IO.exists` obligation for IO implementations, to be used for lazy
  creation of IO instances. (mirage/index#233, @craigfe)

- `Index.close` now takes an `~immediately:()` argument. When passed, this
  causes `close` to terminate any ongoing asynchronous merge operation, rather
  than waiting for it to finish. (mirage/index#185, mirage/index#234)

- Added `Index.Checks.cli`, which provides offline integrity checking of Index
  stores. (mirage/index#236)

## Changed

- `sync` has to be called by the read-only instance to synchronise with the
  files on disk. (mirage/index#175)

- Caching of `Index` instances is now explicit: `Index.Make` requires a cache
  implementation, and `Index.v` may be passed a cache to be used for instance
  sharing. The default behaviour is _not_ to share instances. (mirage/index#188)

## Fixed

- Added values after a clear are found by read-only instances. (mirage/index#168)
- Fix a race between `merge` and `sync` (mirage/index#203, @samoht, @craigfe)
- Fix a potential loss of data if a crash occurs at the end of a merge (mirage/index#232)
craigfe added a commit to craigfe/opam-repository that referenced this pull request Jan 5, 2021
CHANGES:

## Added

- Added `flush_callback` parameter to the creation of a store, to register
  a callback before a flush. This callback can be temporarily disabled by
  `~no_callback:()` to `flush`. (mirage/index#189, mirage/index#216)

- Added `Stats.merge_durations` to list the duration of the last 10 merges.
  (mirage/index#193)

- Added `is_merging` to detect if a merge is running. (mirage/index#192)

- New `IO.Header.{get,set}` functions to read and write the file headers
  atomically (mirage/index#175, mirage/index#204, @icristescu, @craigfe, @samoht)

- Added a `throttle` configuration option to select the strategy to use
  when the cache are full and an async merge is already in progress. The
  current behavior is the (default) `` `Block_writes`` strategy. The new
  `` `Overcommit_memory`` does not block but continue to fill the cache instead.
  (mirage/index#209, @samoht)

- Add `IO.exists` obligation for IO implementations, to be used for lazy
  creation of IO instances. (mirage/index#233, @craigfe)

- `Index.close` now takes an `~immediately:()` argument. When passed, this
  causes `close` to terminate any ongoing asynchronous merge operation, rather
  than waiting for it to finish. (mirage/index#185, mirage/index#234)

- Added `Index.Checks.cli`, which provides offline integrity checking of Index
  stores. (mirage/index#236)

- `Index.replace` now takes a `~overcommit` argument to postpone a merge. (mirage/index#253)

- `Index.merge` is now part of the public API. (mirage/index#253)

- `Index.try_merge` is now part of the public API. `try_merge' is a no-op if
  the number of entries in the write-ahead log is smaller than `log_size`,
  otherwise it's `merge'. (mirage/index#253 @samoht)

## Changed

- `sync` has to be called by the read-only instance to synchronise with the
  files on disk. (mirage/index#175)
- Caching of `Index` instances is now explicit: `Index.Make` requires a cache
  implementation, and `Index.v` may be passed a cache to be used for instance
  sharing. The default behaviour is _not_ to share instances. (mirage/index#188)

## Fixed

- Added values after a clear are found by read-only instances. (mirage/index#168)
- Fix a race between `merge` and `sync` (mirage/index#203, @samoht, @craigfe)
- Fix a potential loss of data if a crash occurs at the end of a merge (mirage/index#232)
- Fix `Index.iter` to only iterate once over elements persisted on the disk
  (mirage/index#260, @samoht, @icristescu)
kit-ty-kate pushed a commit to craigfe/opam-repository that referenced this pull request Jan 6, 2021
CHANGES:

## Added

- Added `flush_callback` parameter to the creation of a store, to register
  a callback before a flush. This callback can be temporarily disabled by
  `~no_callback:()` to `flush`. (mirage/index#189, mirage/index#216)

- Added `Stats.merge_durations` to list the duration of the last 10 merges.
  (mirage/index#193)

- Added `is_merging` to detect if a merge is running. (mirage/index#192)

- New `IO.Header.{get,set}` functions to read and write the file headers
  atomically (mirage/index#175, mirage/index#204, @icristescu, @craigfe, @samoht)

- Added a `throttle` configuration option to select the strategy to use
  when the cache are full and an async merge is already in progress. The
  current behavior is the (default) `` `Block_writes`` strategy. The new
  `` `Overcommit_memory`` does not block but continue to fill the cache instead.
  (mirage/index#209, @samoht)

- Add `IO.exists` obligation for IO implementations, to be used for lazy
  creation of IO instances. (mirage/index#233, @craigfe)

- `Index.close` now takes an `~immediately:()` argument. When passed, this
  causes `close` to terminate any ongoing asynchronous merge operation, rather
  than waiting for it to finish. (mirage/index#185, mirage/index#234)

- Added `Index.Checks.cli`, which provides offline integrity checking of Index
  stores. (mirage/index#236)

- `Index.replace` now takes a `~overcommit` argument to postpone a merge. (mirage/index#253)

- `Index.merge` is now part of the public API. (mirage/index#253)

- `Index.try_merge` is now part of the public API. `try_merge' is a no-op if
  the number of entries in the write-ahead log is smaller than `log_size`,
  otherwise it's `merge'. (mirage/index#253 @samoht)

## Changed

- `sync` has to be called by the read-only instance to synchronise with the
  files on disk. (mirage/index#175)
- Caching of `Index` instances is now explicit: `Index.Make` requires a cache
  implementation, and `Index.v` may be passed a cache to be used for instance
  sharing. The default behaviour is _not_ to share instances. (mirage/index#188)

## Fixed

- Added values after a clear are found by read-only instances. (mirage/index#168)
- Fix a race between `merge` and `sync` (mirage/index#203, @samoht, @craigfe)
- Fix a potential loss of data if a crash occurs at the end of a merge (mirage/index#232)
- Fix `Index.iter` to only iterate once over elements persisted on the disk
  (mirage/index#260, @samoht, @icristescu)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants