Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Index.close merge termination logic opt-in #234

Merged
merged 2 commits into from
Oct 12, 2020

Conversation

craigfe
Copy link
Member

@craigfe craigfe commented Oct 9, 2020

When developing the offline integrity-checking tool for Index, I noticed that the following simple usage of Index is non-deterministic:

let random () =
  let index = Index.v ~fresh:true ~log_size:100 "data/random" in
  for _ = 1 to 1001 do
    Index.replace index (random ()) (random ())
  done;
  Index.close index

In particular, the Index.close call may or may not cancel an ongoing merge operation, meaning that we cannot know what files will exist on disk after running this code. After pondering on it for a bit, I think this behaviour of close should be opt-in. This PR does exactly that.

@craigfe craigfe force-pushed the opt-in-merge-cancellation branch from 2ab566a to 613046e Compare October 9, 2020 18:18
Copy link
Contributor

@icristescu icristescu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I would just use abort instead of immediately, but both ways is fine.

@craigfe craigfe merged commit a21eba0 into mirage:master Oct 12, 2020
craigfe added a commit to craigfe/opam-repository that referenced this pull request Oct 21, 2020
CHANGES:

## Added

- Added `flush_callback` parameter to the creation of a store, to register
  a callback before a flush. This callback can be temporarily disabled by
  `~no_callback:()` to `flush`. (mirage/index#189, mirage/index#216)

- Added `Stats.merge_durations` to list the duration of the last 10 merges.
  (mirage/index#193)

- Added `is_merging` to detect if a merge is running. (mirage/index#192)

- New `IO.Header.{get,set}` functions to read and write the file headers
  atomically (mirage/index#175, mirage/index#204, @icristescu, @craigfe, @samoht)

- Added a `throttle` configuration option to select the strategy to use
  when the cache are full and an async merge is already in progress. The
  current behavior is the (default) [`Block_writes] strategy. The new
  [`Overcommit_memory] does not block but continue to fill the cache instead.
  (mirage/index#209, @samoht)

- Add `IO.exists` obligation for IO implementations, to be used for lazy
  creation of IO instances. (mirage/index#233, @craigfe)

- `Index.close` now takes an `~immediately:()` argument. When passed, this
  causes `close` to terminate any ongoing asynchronous merge operation, rather
  than waiting for it to finish. (mirage/index#185, mirage/index#234)

- Added `Index.Checks.cli`, which provides offline integrity checking of Index
  stores. (mirage/index#236)

## Changed

- `sync` has to be called by the read-only instance to synchronise with the
  files on disk. (mirage/index#175)

- Caching of `Index` instances is now explicit: `Index.Make` requires a cache
  implementation, and `Index.v` may be passed a cache to be used for instance
  sharing. The default behaviour is _not_ to share instances. (mirage/index#188)

## Fixed

- Added values after a clear are found by read-only instances. (mirage/index#168)
- Fix a race between `merge` and `sync` (mirage/index#203, @samoht, @craigfe)
- Fix a potential loss of data if a crash occurs at the end of a merge (mirage/index#232)
craigfe added a commit to craigfe/opam-repository that referenced this pull request Jan 5, 2021
CHANGES:

## Added

- Added `flush_callback` parameter to the creation of a store, to register
  a callback before a flush. This callback can be temporarily disabled by
  `~no_callback:()` to `flush`. (mirage/index#189, mirage/index#216)

- Added `Stats.merge_durations` to list the duration of the last 10 merges.
  (mirage/index#193)

- Added `is_merging` to detect if a merge is running. (mirage/index#192)

- New `IO.Header.{get,set}` functions to read and write the file headers
  atomically (mirage/index#175, mirage/index#204, @icristescu, @craigfe, @samoht)

- Added a `throttle` configuration option to select the strategy to use
  when the cache are full and an async merge is already in progress. The
  current behavior is the (default) `` `Block_writes`` strategy. The new
  `` `Overcommit_memory`` does not block but continue to fill the cache instead.
  (mirage/index#209, @samoht)

- Add `IO.exists` obligation for IO implementations, to be used for lazy
  creation of IO instances. (mirage/index#233, @craigfe)

- `Index.close` now takes an `~immediately:()` argument. When passed, this
  causes `close` to terminate any ongoing asynchronous merge operation, rather
  than waiting for it to finish. (mirage/index#185, mirage/index#234)

- Added `Index.Checks.cli`, which provides offline integrity checking of Index
  stores. (mirage/index#236)

- `Index.replace` now takes a `~overcommit` argument to postpone a merge. (mirage/index#253)

- `Index.merge` is now part of the public API. (mirage/index#253)

- `Index.try_merge` is now part of the public API. `try_merge' is a no-op if
  the number of entries in the write-ahead log is smaller than `log_size`,
  otherwise it's `merge'. (mirage/index#253 @samoht)

## Changed

- `sync` has to be called by the read-only instance to synchronise with the
  files on disk. (mirage/index#175)
- Caching of `Index` instances is now explicit: `Index.Make` requires a cache
  implementation, and `Index.v` may be passed a cache to be used for instance
  sharing. The default behaviour is _not_ to share instances. (mirage/index#188)

## Fixed

- Added values after a clear are found by read-only instances. (mirage/index#168)
- Fix a race between `merge` and `sync` (mirage/index#203, @samoht, @craigfe)
- Fix a potential loss of data if a crash occurs at the end of a merge (mirage/index#232)
- Fix `Index.iter` to only iterate once over elements persisted on the disk
  (mirage/index#260, @samoht, @icristescu)
kit-ty-kate pushed a commit to craigfe/opam-repository that referenced this pull request Jan 6, 2021
CHANGES:

## Added

- Added `flush_callback` parameter to the creation of a store, to register
  a callback before a flush. This callback can be temporarily disabled by
  `~no_callback:()` to `flush`. (mirage/index#189, mirage/index#216)

- Added `Stats.merge_durations` to list the duration of the last 10 merges.
  (mirage/index#193)

- Added `is_merging` to detect if a merge is running. (mirage/index#192)

- New `IO.Header.{get,set}` functions to read and write the file headers
  atomically (mirage/index#175, mirage/index#204, @icristescu, @craigfe, @samoht)

- Added a `throttle` configuration option to select the strategy to use
  when the cache are full and an async merge is already in progress. The
  current behavior is the (default) `` `Block_writes`` strategy. The new
  `` `Overcommit_memory`` does not block but continue to fill the cache instead.
  (mirage/index#209, @samoht)

- Add `IO.exists` obligation for IO implementations, to be used for lazy
  creation of IO instances. (mirage/index#233, @craigfe)

- `Index.close` now takes an `~immediately:()` argument. When passed, this
  causes `close` to terminate any ongoing asynchronous merge operation, rather
  than waiting for it to finish. (mirage/index#185, mirage/index#234)

- Added `Index.Checks.cli`, which provides offline integrity checking of Index
  stores. (mirage/index#236)

- `Index.replace` now takes a `~overcommit` argument to postpone a merge. (mirage/index#253)

- `Index.merge` is now part of the public API. (mirage/index#253)

- `Index.try_merge` is now part of the public API. `try_merge' is a no-op if
  the number of entries in the write-ahead log is smaller than `log_size`,
  otherwise it's `merge'. (mirage/index#253 @samoht)

## Changed

- `sync` has to be called by the read-only instance to synchronise with the
  files on disk. (mirage/index#175)
- Caching of `Index` instances is now explicit: `Index.Make` requires a cache
  implementation, and `Index.v` may be passed a cache to be used for instance
  sharing. The default behaviour is _not_ to share instances. (mirage/index#188)

## Fixed

- Added values after a clear are found by read-only instances. (mirage/index#168)
- Fix a race between `merge` and `sync` (mirage/index#203, @samoht, @craigfe)
- Fix a potential loss of data if a crash occurs at the end of a merge (mirage/index#232)
- Fix `Index.iter` to only iterate once over elements persisted on the disk
  (mirage/index#260, @samoht, @icristescu)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants