Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream fixes #133

Merged
merged 2 commits into from
Feb 15, 2022
Merged

Stream fixes #133

merged 2 commits into from
Feb 15, 2022

Conversation

gasche
Copy link
Contributor

@gasche gasche commented Feb 10, 2022

Fixing stream_to_file bugs reported by @tcoopman in #131 (comment). The first commit is a clear bugfix (without it, the function is completely broken; I think it has been since 9310500), the second is rather an improvement in behavior -- the memory usage of the stream_to_{file,channel} functions now corresponds to the size of the largest element in the stream, instead of the total size of all elements of the stream.

@tcoopman
Copy link

One thing I noticed, is that with js_of_ocaml if the output channel is not flushed regularly, the memory consumption also keeps growing - even with the fix in this PR.

As flushing the output_channel seems very costly doing it on every iteration is definitely not ok. I'm not sure if this should be part of yojson or just documented (or if it's a bug in js_of_ocaml), but if you're using this code with js_of_ocaml and don't want to grow your memory consumption, the fix seems to be to flush oc every XX iterations.
See: https://discuss.ocaml.org/t/memory-keeps-growing-with-buffer-and-js-of-ocaml/9299

If anyone knows a better way of handling this, I'd like to know :-)

@gasche
Copy link
Contributor Author

gasche commented Feb 11, 2022

I think the fix for this issue doesn't belong to the Yojson layer -- we could flush regularly on the producer side, but this is a change of behavior that is not justified in general. It may be possible to workaround the js_of_ocaml issue on the client side, by doing something like let seq = Seq.mapi (fun i v -> if i mod n = 0 then flush oc; v) in Yojson.seq_to_channel seq oc. You probably want a fix on the js_of_ocaml side, though.

Copy link
Member

@Leonidas-from-XIV Leonidas-from-XIV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, these are some really nice changes and I really appreciate that you added a test to make sure it doesn't break in the future anymore.

I was thinking whether there's a good way to test that the memory use doesn't balloon out of proportion but that's quite tricky as the test would need to monitor the sizes of the buffers or something like this, which feels rather fragile.

Stream.iter (fun json ->
to_buffer ?std ob json;
Buffer.output_buffer oc ob;
Buffer.clear ob;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this semicolon is superfluous.

in
Sys.remove output_file;
if data <> read_data then
(* TODO: it would be nice to use Alcotest.check,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want you can add an instance for Yojson.Safe.t if you want, it would be a nice improvement and should be quite simple to do (and actually sensible in the long run, I think the reason why there isn't one is because the tests are mostly translated from OUnit. But this is up for you, a nice to have but not necessary.

@Leonidas-from-XIV Leonidas-from-XIV added this to the 2.0.0 milestone Feb 14, 2022
@Leonidas-from-XIV
Copy link
Member

@gasche Can you add an entry to the Changelog? The CI failure is due to 4.14 being added to the test matrix, but it works everywhere else so that will be solved by the Seq migration.

@gasche
Copy link
Contributor Author

gasche commented Feb 15, 2022

Can you add an entry to the Changelog?

Done!

@Leonidas-from-XIV Leonidas-from-XIV merged commit 616a683 into ocaml-community:master Feb 15, 2022
@gasche
Copy link
Contributor Author

gasche commented Feb 15, 2022

Thanks for the merge! I'll work on rebasing #131 next.

panglesd added a commit to panglesd/opam-repository that referenced this pull request Jun 2, 2022
CHANGES:

### Removed

- Removed dependency on easy-format and removed `pretty_format` from
  `Yojson`, `Yojson.Basic`, `Yojson.Safe` and `Yojson.Raw`. (@c-cube, ocaml-community/yojson#90)
- Removed dependency on `biniou`, simplifying the chain of dependencies. This
  changes some APIs:
  * `Bi_outbuf.t` in signatures is replaced with `Buffer.t`
  * `to_outbuf` becomes `to_buffer` and `stream_to_outbuf` becomes
    `stream_to_buffer`
  (@Leonidas-from-XIV, ocaml-community/yojson#74, and @gasche, ocaml-community/yojson#132)
- Removed `yojson-biniou` library
- Removed deprecated `json` type aliasing type `t` which has been available
  since 1.6.0 (@Leonidas-from-XIV, ocaml-community/yojson#100).
- Removed `json_max` type (@Leonidas-from-XIV, ocaml-community/yojson#103)
- Removed constraint that the "root" value being rendered (via either
  `pretty_print` or `to_string`) must be an object or array. (@cemerick, ocaml-community/yojson#121)
- Removed `validate_json` as it only made sense if the type was called `json`.
  (@Leonidas-from-XIV, ocaml-community/yojson#137)

### Add

- Add an opam package `yojson-bench` to deal with benchmarks dependency
  (@tmcgilchrist, ocaml-community/yojson#117)
- Add a benchmark to judge the respective performance of providing a buffer vs
  letting Yojson create an internal (ocaml-community/yojson#134, @Leonidas-from-XIV)
- Add an optional `suf` keyword argument was added to functions that write
  serialized JSON, thus allowing NDJSON output. Most functions default to not
  adding any suffix except for `to_file` (ocaml-community/yojson#124, @panglesd) and functions
  writing sequences of values where the default is `\n` (ocaml-community/yojson#135,
  @Leonidas-from-XIV)

### Change

- The `stream_from_*` and `stream_to_*` functions now use a `Seq.t` instead of a
  `Stream.t`, and they are renamed into `seq_from_*` and `seq_to_*` (@gasche, ocaml-community/yojson#131).

### Fix

- Avoid copying unnecessarily large amounts of strings when parsing (ocaml-community/yojson#85, ocaml-community/yojson#108,
  @Leonidas-from-XIV)
- Fix `stream_to_file` (ocaml-community/yojson#133, @tcoopman and @gasche)
panglesd added a commit to panglesd/opam-repository that referenced this pull request Jun 3, 2022
CHANGES:

### Removed

- Removed dependency on easy-format and removed `pretty_format` from
  `Yojson`, `Yojson.Basic`, `Yojson.Safe` and `Yojson.Raw`. (@c-cube, ocaml-community/yojson#90)
- Removed dependency on `biniou`, simplifying the chain of dependencies. This
  changes some APIs:
  * `Bi_outbuf.t` in signatures is replaced with `Buffer.t`
  * `to_outbuf` becomes `to_buffer` and `stream_to_outbuf` becomes
    `stream_to_buffer`
  (@Leonidas-from-XIV, ocaml-community/yojson#74, and @gasche, ocaml-community/yojson#132)
- Removed `yojson-biniou` library
- Removed deprecated `json` type aliasing type `t` which has been available
  since 1.6.0 (@Leonidas-from-XIV, ocaml-community/yojson#100).
- Removed `json_max` type (@Leonidas-from-XIV, ocaml-community/yojson#103)
- Removed constraint that the "root" value being rendered (via either
  `pretty_print` or `to_string`) must be an object or array. (@cemerick, ocaml-community/yojson#121)
- Removed `validate_json` as it only made sense if the type was called `json`.
  (@Leonidas-from-XIV, ocaml-community/yojson#137)

### Add

- Add an opam package `yojson-bench` to deal with benchmarks dependency
  (@tmcgilchrist, ocaml-community/yojson#117)
- Add a benchmark to judge the respective performance of providing a buffer vs
  letting Yojson create an internal (ocaml-community/yojson#134, @Leonidas-from-XIV)
- Add an optional `suf` keyword argument was added to functions that write
  serialized JSON, thus allowing NDJSON output. Most functions default to not
  adding any suffix except for `to_file` (ocaml-community/yojson#124, @panglesd) and functions
  writing sequences of values where the default is `\n` (ocaml-community/yojson#135,
  @Leonidas-from-XIV)

### Change

- The `stream_from_*` and `stream_to_*` functions now use a `Seq.t` instead of a
  `Stream.t`, and they are renamed into `seq_from_*` and `seq_to_*` (@gasche, ocaml-community/yojson#131).

### Fix

- Avoid copying unnecessarily large amounts of strings when parsing (ocaml-community/yojson#85, ocaml-community/yojson#108,
  @Leonidas-from-XIV)
- Fix `stream_to_file` (ocaml-community/yojson#133, @tcoopman and @gasche)
Leonidas-from-XIV pushed a commit to panglesd/opam-repository that referenced this pull request Jun 9, 2022
CHANGES:

### Removed

- Removed dependency on easy-format and removed `pretty_format` from
  `Yojson`, `Yojson.Basic`, `Yojson.Safe` and `Yojson.Raw`. (@c-cube, ocaml-community/yojson#90)
- Removed dependency on `biniou`, simplifying the chain of dependencies. This
  changes some APIs:
  * `Bi_outbuf.t` in signatures is replaced with `Buffer.t`
  * `to_outbuf` becomes `to_buffer` and `stream_to_outbuf` becomes
    `stream_to_buffer`
  (@Leonidas-from-XIV, ocaml-community/yojson#74, and @gasche, ocaml-community/yojson#132)
- Removed `yojson-biniou` library
- Removed deprecated `json` type aliasing type `t` which has been available
  since 1.6.0 (@Leonidas-from-XIV, ocaml-community/yojson#100).
- Removed `json_max` type (@Leonidas-from-XIV, ocaml-community/yojson#103)
- Removed constraint that the "root" value being rendered (via either
  `pretty_print` or `to_string`) must be an object or array. (@cemerick, ocaml-community/yojson#121)
- Removed `validate_json` as it only made sense if the type was called `json`.
  (@Leonidas-from-XIV, ocaml-community/yojson#137)

### Add

- Add an opam package `yojson-bench` to deal with benchmarks dependency
  (@tmcgilchrist, ocaml-community/yojson#117)
- Add a benchmark to judge the respective performance of providing a buffer vs
  letting Yojson create an internal (ocaml-community/yojson#134, @Leonidas-from-XIV)
- Add an optional `suf` keyword argument was added to functions that write
  serialized JSON, thus allowing NDJSON output. Most functions default to not
  adding any suffix except for `to_file` (ocaml-community/yojson#124, @panglesd) and functions
  writing sequences of values where the default is `\n` (ocaml-community/yojson#135,
  @Leonidas-from-XIV)

### Change

- The `stream_from_*` and `stream_to_*` functions now use a `Seq.t` instead of a
  `Stream.t`, and they are renamed into `seq_from_*` and `seq_to_*` (@gasche, ocaml-community/yojson#131).

### Fix

- Avoid copying unnecessarily large amounts of strings when parsing (ocaml-community/yojson#85, ocaml-community/yojson#108,
  @Leonidas-from-XIV)
- Fix `stream_to_file` (ocaml-community/yojson#133, @tcoopman and @gasche)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants