Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an overcommit option in replace to postpone merges #253

Merged
merged 4 commits into from
Dec 17, 2020

Conversation

icristescu
Copy link
Contributor

Experimenting with adding more control over merges from the freeze thread in irmin-pack. For now the irmin-pack benchmarks look promising:

  • the spikes are the freezes that block waiting for previous freeze to complete
    freezes
  • if we remove the spike due to freezes and only look at the time for the rest of the commits:
    wo_freezes

On top of #251.

@samoht
Copy link
Member

samoht commented Dec 7, 2020

Can you also plot the memory usage to see the impact on the memory usage? (I suspect this won't be much but it's worth checking)

@icristescu
Copy link
Contributor Author

Here is the maxrss for these benchmarks
maxrss_irmin_pack

@icristescu icristescu marked this pull request as ready for review December 8, 2020 21:10
@@ -169,6 +169,9 @@ module type S = sig
(** [is_merging t] returns true if [t] is running a merge. Raises
{!RO_not_allowed} if called by a read-only index. *)

val force_merge : t -> unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can we just call it merge? I think it's only called force_merge internally because we already had a function called merge.
  • I think I would probably expect this to block; perhaps worth mentioning that it doesn't in the doc-comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks 👍

@icristescu icristescu force-pushed the freeze branch 2 times, most recently from 34b1630 to 29e6841 Compare December 9, 2020 16:13
Comment on lines 176 to 177
(** [merge t] forces a merge for [t]. It is non-blocking, i.e. it returns
immediately, with the merge running concurrently. *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does block if a merge is already running, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I changed it, let me know if it's clear

@icristescu icristescu force-pushed the freeze branch 4 times, most recently from d252a99 to e5bf9ed Compare December 16, 2020 14:30
@icristescu
Copy link
Contributor Author

icristescu commented Dec 16, 2020

I added a commit with the changes proposed by opam-dune-lint as the linting job was failing. However, if I add the "ppx_repr" {>= dev} to index-bench.opam as suggested by opam-dune-lint, there is a dependency issue

ppxlib -> ppxlib.0.17.0
    ppx_repr 0.1.0 requires >= 0.12.0 & < 0.18.0

I'm not sure how to fix

@talex5
Copy link

talex5 commented Dec 16, 2020

@icristescu : opam-dune-lint is new and might be buggy. What was the full error you got? The bit you pasted doesn't look like an error: it's just saying it chose ppxlib.0.17.0, and warning that isn't the latest version but it had to pick it due to a requirement from ppx_repr. The actual error will be elsewhere.

@icristescu
Copy link
Contributor Author

I added back the problematic commit, the logs now should show the problem.

The bit you pasted doesn't look like an error: it's just saying it chose ppxlib.0.17.0, and warning that isn't the latest version but it had to pick it due to a requirement from ppx_repr. The actual error will be elsewhere.

Sorry about that, that was the last bit before the error, so I wrongly suspected it to be the problem.

@talex5
Copy link

talex5 commented Dec 16, 2020

OK, so the error is:

- index-bench -> (problem)
    Rejected candidates:
      index-bench.dev: Requires ppx_repr >= dev

Perhaps you just need to remove the >= dev constraint? It defaults to using the version you are using as the lower bound, but maybe you don't really need dev? The lint error in CI suggests:

index-bench.opam: changes needed:
  "ppx_repr" {>= 0.1.0}

@icristescu icristescu force-pushed the freeze branch 2 times, most recently from d5d2bba to c3210b0 Compare December 16, 2020 17:28
@icristescu
Copy link
Contributor Author

@talex5 thanks a lot, that fixed the issue

@icristescu
Copy link
Contributor Author

Rebased. If no objection, I'll merge this shortly.

[k].

If [overcommit] is passed, the operation does not triger a merge, even if
the caches are full. *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe somehow add that the default is overcommit=false?

@samoht samoht merged commit 57f0504 into mirage:master Dec 17, 2020
@icristescu icristescu deleted the freeze branch December 17, 2020 22:19
craigfe added a commit to craigfe/opam-repository that referenced this pull request Jan 5, 2021
CHANGES:

## Added

- Added `flush_callback` parameter to the creation of a store, to register
  a callback before a flush. This callback can be temporarily disabled by
  `~no_callback:()` to `flush`. (mirage/index#189, mirage/index#216)

- Added `Stats.merge_durations` to list the duration of the last 10 merges.
  (mirage/index#193)

- Added `is_merging` to detect if a merge is running. (mirage/index#192)

- New `IO.Header.{get,set}` functions to read and write the file headers
  atomically (mirage/index#175, mirage/index#204, @icristescu, @craigfe, @samoht)

- Added a `throttle` configuration option to select the strategy to use
  when the cache are full and an async merge is already in progress. The
  current behavior is the (default) `` `Block_writes`` strategy. The new
  `` `Overcommit_memory`` does not block but continue to fill the cache instead.
  (mirage/index#209, @samoht)

- Add `IO.exists` obligation for IO implementations, to be used for lazy
  creation of IO instances. (mirage/index#233, @craigfe)

- `Index.close` now takes an `~immediately:()` argument. When passed, this
  causes `close` to terminate any ongoing asynchronous merge operation, rather
  than waiting for it to finish. (mirage/index#185, mirage/index#234)

- Added `Index.Checks.cli`, which provides offline integrity checking of Index
  stores. (mirage/index#236)

- `Index.replace` now takes a `~overcommit` argument to postpone a merge. (mirage/index#253)

- `Index.merge` is now part of the public API. (mirage/index#253)

- `Index.try_merge` is now part of the public API. `try_merge' is a no-op if
  the number of entries in the write-ahead log is smaller than `log_size`,
  otherwise it's `merge'. (mirage/index#253 @samoht)

## Changed

- `sync` has to be called by the read-only instance to synchronise with the
  files on disk. (mirage/index#175)
- Caching of `Index` instances is now explicit: `Index.Make` requires a cache
  implementation, and `Index.v` may be passed a cache to be used for instance
  sharing. The default behaviour is _not_ to share instances. (mirage/index#188)

## Fixed

- Added values after a clear are found by read-only instances. (mirage/index#168)
- Fix a race between `merge` and `sync` (mirage/index#203, @samoht, @craigfe)
- Fix a potential loss of data if a crash occurs at the end of a merge (mirage/index#232)
- Fix `Index.iter` to only iterate once over elements persisted on the disk
  (mirage/index#260, @samoht, @icristescu)
kit-ty-kate pushed a commit to craigfe/opam-repository that referenced this pull request Jan 6, 2021
CHANGES:

## Added

- Added `flush_callback` parameter to the creation of a store, to register
  a callback before a flush. This callback can be temporarily disabled by
  `~no_callback:()` to `flush`. (mirage/index#189, mirage/index#216)

- Added `Stats.merge_durations` to list the duration of the last 10 merges.
  (mirage/index#193)

- Added `is_merging` to detect if a merge is running. (mirage/index#192)

- New `IO.Header.{get,set}` functions to read and write the file headers
  atomically (mirage/index#175, mirage/index#204, @icristescu, @craigfe, @samoht)

- Added a `throttle` configuration option to select the strategy to use
  when the cache are full and an async merge is already in progress. The
  current behavior is the (default) `` `Block_writes`` strategy. The new
  `` `Overcommit_memory`` does not block but continue to fill the cache instead.
  (mirage/index#209, @samoht)

- Add `IO.exists` obligation for IO implementations, to be used for lazy
  creation of IO instances. (mirage/index#233, @craigfe)

- `Index.close` now takes an `~immediately:()` argument. When passed, this
  causes `close` to terminate any ongoing asynchronous merge operation, rather
  than waiting for it to finish. (mirage/index#185, mirage/index#234)

- Added `Index.Checks.cli`, which provides offline integrity checking of Index
  stores. (mirage/index#236)

- `Index.replace` now takes a `~overcommit` argument to postpone a merge. (mirage/index#253)

- `Index.merge` is now part of the public API. (mirage/index#253)

- `Index.try_merge` is now part of the public API. `try_merge' is a no-op if
  the number of entries in the write-ahead log is smaller than `log_size`,
  otherwise it's `merge'. (mirage/index#253 @samoht)

## Changed

- `sync` has to be called by the read-only instance to synchronise with the
  files on disk. (mirage/index#175)
- Caching of `Index` instances is now explicit: `Index.Make` requires a cache
  implementation, and `Index.v` may be passed a cache to be used for instance
  sharing. The default behaviour is _not_ to share instances. (mirage/index#188)

## Fixed

- Added values after a clear are found by read-only instances. (mirage/index#168)
- Fix a race between `merge` and `sync` (mirage/index#203, @samoht, @craigfe)
- Fix a potential loss of data if a crash occurs at the end of a merge (mirage/index#232)
- Fix `Index.iter` to only iterate once over elements persisted on the disk
  (mirage/index#260, @samoht, @icristescu)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants