-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib/write: more regular usage protocol for ?buf parameters #132
lib/write: more regular usage protocol for ?buf parameters #132
Conversation
Intuitively I'd say that clearing on exit is redundant, because it's better to always clear on entry. But it's probably a good defensive practice anyway and is basically free. |
I agree, but I sticked to "clear on exit" as this is what the documentation of |
To be a bit more concrete, here is an example of behaviors that I think are bugs: $ dune utop
(* build a buffer with some garbage inside *)
# let buf = Buffer.create 42;
# Buffer.add_string buf "Hello World\n";;
(* pass the buffer to Yojson.to_channel *)
# let oc = open_out "/tmp/foo.txt" in
Fun.protect ~finally:(fun () -> close_out oc) @@ fun () ->
Yojson.to_channel ~buf oc (`List [`String "foo"]);;
(* the garbage ends up included in the file! *)
# Sys.command "cat /tmp/foo.txt && echo";;
Hello World
["foo"]
(* write to a different file now, with the same buffer *)
# let oc = open_out "/tmp/bar.txt" in
Fun.protect ~finally:(fun () -> close_out oc) @@ fun () ->
Yojson.to_channel ~buf oc (`List [`String "bar"]);;
(* oh noes, even more garbage! *)
# Sys.command "cat /tmp/bar.txt && echo";;
Hello World
["foo"]["bar"] |
I doubt it given there was never a release that had Biniou replaced with Buffer so unless they pinned the git version it is quite unlikely, as such the behavior can be improved as we see fit. |
Ah! I hadn't realized. Then I believe you should merge this PR before the next release :-) |
I agree that we should be consistent. Though it would be good to know what we want it to behave like, e.g why there is a The way I understand it it seems that what you consider a bug seems to be exactly the behavior that it is meant to have: you pass a buffer that you got from somewhere and want it to be written to the Though I have to say I have a hard time understanding what usecase passing such a buffer would have. If you want to mix different sources it seems to me that it would make more sense to just write to the |
The way I immediately interpreted this API, something IO related with an optional The content of the buffer is, and should be, totally irrelevant. If that's not the intended use case then the API and comments need to be super clear about it :) |
Yes, that's somewhat the most sensible explanation. In that case, yes, it is a bug and we should fix it. Also, I think we should document it as such an optimization and for that we only need to clear in the beginning. (Sidenote: I'd be curious to see how much this optimization saves. Maybe I should add a benchmark entry for it) |
Intuitively I think it must save a lot if you write many values. It also should impact favorably the memory consumption :-). |
Indeed, I believe that the buffer is there to serve as intermediate storage. The specification should be: "we (Yojson) don't assume anything about the content of the buffer coming in, and you (the user) can't assume anything about the content after the call". But |
Yet another way to see that the current behavior is broken: # let oc = open_out "/tmp/out.txt";;
# let buf = Buffer.create 42;;
# (* printing "foo" three times *)
for i = 1 to 3 do Yojson.to_channel ~buf oc (`String "foo"); done;;
# close_out oc;;
# Sys.command "cat /tmp/out.txt && echo";; What output do you expect? Probably not the current one:
|
Another bug is that the docs say:
There is no newline inserted at the moment. I don't really mind that this is not the case but maybe we should decide to either keep it or update the documentation |
When writing a sequence of json values, I think providing a |
Though, should it be a separator? Or an end-marker (aka trailing What I would suggest is make it consistent:
|
Newlines are unrelated to the present PR, which is about the use of temporary buffers. That should go in its own PR. (@panglesd implemented 6696f55, so there's a natural candidate for more newline-related work :-) I'm not convinced that not clearing at the end is a good idea. It would be a fine design if we wrote these functions from scratch, but they are adapted from previous functions using Biniou channels (you chose to adapt the functions instead of removing them completely), and then I think that clearing at the end makes more sense (because otherwise the data is duplicated, it is sent to the output channel and kept in the buffer). It is also what is documented (and has always been) for So: my vote would be the PR as it currently is. |
@Leonidas-from-XIV good question, but in any case I think it's better to do just like ndjson :) |
@c-cube I looked it up in the (or a?) spec and it states
So yes, trailing newlines. But fair enough, let's make the |
@gasche Can you add a changelog entry? |
The handling of `?buf` parameters today makes little sense: - `to_string` and all `stream_*` functions clear the buffer before using it (so: they guarantee that they work in the same way no matter what the input state of the buffer is, and they make no guarantees on its output state) - but `to_channel` and `to_output` make weird claims about it in the documentation, that don't make sense to me, and appear to *not* clear the buffer and in fact include, in their output, the content of the buffer as it is passed. I believe that the reason for this difference comes from the previous codebase using Biniou, that was replaced by Buffer in 9310500 The change left the usage of ?buf in `to_channel` and `to_output` rather incoherent (and I think broke the documentation somewhat), due to different usage properties of Biniou_outbuf, who "owns" the underlying output channel and implements its own buffering, and Buffer, which is not related to an output channel: the previous API may have made sense for Binio, but it does not anymore. What's not completely clear to me is whether some code in the wild could rely on the previous, nonsensical behaviour. This may happen if the code was written with Biniou in the past, and converted to use Buffer in the same systematic (and slightly wrong) way as Yojson itself. (Note: there are two independent questions: (1) whether the buffer is cleared on entry, and (2) whether the buffer is cleared on exit. Currently for `to_channel` and `to_output` neither is done, and at least for (2) this seems in direct contradiction with what the documentation says -- at least with the natural interpretation of "buf is flushed" as "the content is send to the output channel, and the buffer is cleared". So the lack of (2) is clearly a bug, and the lack of (1) might not be.)
66016ce
to
29f56de
Compare
I amended the Changes entry corresponding to the removal of the Biniou dependency, because this PR is just a continuation of that work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with merging this. The CI is broken for unrelated reasons (opam lint
issues) but the code builds fine across the range.
CI is fixed too, thanks OCaml-CI maintainers, the build is green now. |
CHANGES: ### Removed - Removed dependency on easy-format and removed `pretty_format` from `Yojson`, `Yojson.Basic`, `Yojson.Safe` and `Yojson.Raw`. (@c-cube, ocaml-community/yojson#90) - Removed dependency on `biniou`, simplifying the chain of dependencies. This changes some APIs: * `Bi_outbuf.t` in signatures is replaced with `Buffer.t` * `to_outbuf` becomes `to_buffer` and `stream_to_outbuf` becomes `stream_to_buffer` (@Leonidas-from-XIV, ocaml-community/yojson#74, and @gasche, ocaml-community/yojson#132) - Removed `yojson-biniou` library - Removed deprecated `json` type aliasing type `t` which has been available since 1.6.0 (@Leonidas-from-XIV, ocaml-community/yojson#100). - Removed `json_max` type (@Leonidas-from-XIV, ocaml-community/yojson#103) - Removed constraint that the "root" value being rendered (via either `pretty_print` or `to_string`) must be an object or array. (@cemerick, ocaml-community/yojson#121) - Removed `validate_json` as it only made sense if the type was called `json`. (@Leonidas-from-XIV, ocaml-community/yojson#137) ### Add - Add an opam package `yojson-bench` to deal with benchmarks dependency (@tmcgilchrist, ocaml-community/yojson#117) - Add a benchmark to judge the respective performance of providing a buffer vs letting Yojson create an internal (ocaml-community/yojson#134, @Leonidas-from-XIV) - Add an optional `suf` keyword argument was added to functions that write serialized JSON, thus allowing NDJSON output. Most functions default to not adding any suffix except for `to_file` (ocaml-community/yojson#124, @panglesd) and functions writing sequences of values where the default is `\n` (ocaml-community/yojson#135, @Leonidas-from-XIV) ### Change - The `stream_from_*` and `stream_to_*` functions now use a `Seq.t` instead of a `Stream.t`, and they are renamed into `seq_from_*` and `seq_to_*` (@gasche, ocaml-community/yojson#131). ### Fix - Avoid copying unnecessarily large amounts of strings when parsing (ocaml-community/yojson#85, ocaml-community/yojson#108, @Leonidas-from-XIV) - Fix `stream_to_file` (ocaml-community/yojson#133, @tcoopman and @gasche)
CHANGES: ### Removed - Removed dependency on easy-format and removed `pretty_format` from `Yojson`, `Yojson.Basic`, `Yojson.Safe` and `Yojson.Raw`. (@c-cube, ocaml-community/yojson#90) - Removed dependency on `biniou`, simplifying the chain of dependencies. This changes some APIs: * `Bi_outbuf.t` in signatures is replaced with `Buffer.t` * `to_outbuf` becomes `to_buffer` and `stream_to_outbuf` becomes `stream_to_buffer` (@Leonidas-from-XIV, ocaml-community/yojson#74, and @gasche, ocaml-community/yojson#132) - Removed `yojson-biniou` library - Removed deprecated `json` type aliasing type `t` which has been available since 1.6.0 (@Leonidas-from-XIV, ocaml-community/yojson#100). - Removed `json_max` type (@Leonidas-from-XIV, ocaml-community/yojson#103) - Removed constraint that the "root" value being rendered (via either `pretty_print` or `to_string`) must be an object or array. (@cemerick, ocaml-community/yojson#121) - Removed `validate_json` as it only made sense if the type was called `json`. (@Leonidas-from-XIV, ocaml-community/yojson#137) ### Add - Add an opam package `yojson-bench` to deal with benchmarks dependency (@tmcgilchrist, ocaml-community/yojson#117) - Add a benchmark to judge the respective performance of providing a buffer vs letting Yojson create an internal (ocaml-community/yojson#134, @Leonidas-from-XIV) - Add an optional `suf` keyword argument was added to functions that write serialized JSON, thus allowing NDJSON output. Most functions default to not adding any suffix except for `to_file` (ocaml-community/yojson#124, @panglesd) and functions writing sequences of values where the default is `\n` (ocaml-community/yojson#135, @Leonidas-from-XIV) ### Change - The `stream_from_*` and `stream_to_*` functions now use a `Seq.t` instead of a `Stream.t`, and they are renamed into `seq_from_*` and `seq_to_*` (@gasche, ocaml-community/yojson#131). ### Fix - Avoid copying unnecessarily large amounts of strings when parsing (ocaml-community/yojson#85, ocaml-community/yojson#108, @Leonidas-from-XIV) - Fix `stream_to_file` (ocaml-community/yojson#133, @tcoopman and @gasche)
CHANGES: ### Removed - Removed dependency on easy-format and removed `pretty_format` from `Yojson`, `Yojson.Basic`, `Yojson.Safe` and `Yojson.Raw`. (@c-cube, ocaml-community/yojson#90) - Removed dependency on `biniou`, simplifying the chain of dependencies. This changes some APIs: * `Bi_outbuf.t` in signatures is replaced with `Buffer.t` * `to_outbuf` becomes `to_buffer` and `stream_to_outbuf` becomes `stream_to_buffer` (@Leonidas-from-XIV, ocaml-community/yojson#74, and @gasche, ocaml-community/yojson#132) - Removed `yojson-biniou` library - Removed deprecated `json` type aliasing type `t` which has been available since 1.6.0 (@Leonidas-from-XIV, ocaml-community/yojson#100). - Removed `json_max` type (@Leonidas-from-XIV, ocaml-community/yojson#103) - Removed constraint that the "root" value being rendered (via either `pretty_print` or `to_string`) must be an object or array. (@cemerick, ocaml-community/yojson#121) - Removed `validate_json` as it only made sense if the type was called `json`. (@Leonidas-from-XIV, ocaml-community/yojson#137) ### Add - Add an opam package `yojson-bench` to deal with benchmarks dependency (@tmcgilchrist, ocaml-community/yojson#117) - Add a benchmark to judge the respective performance of providing a buffer vs letting Yojson create an internal (ocaml-community/yojson#134, @Leonidas-from-XIV) - Add an optional `suf` keyword argument was added to functions that write serialized JSON, thus allowing NDJSON output. Most functions default to not adding any suffix except for `to_file` (ocaml-community/yojson#124, @panglesd) and functions writing sequences of values where the default is `\n` (ocaml-community/yojson#135, @Leonidas-from-XIV) ### Change - The `stream_from_*` and `stream_to_*` functions now use a `Seq.t` instead of a `Stream.t`, and they are renamed into `seq_from_*` and `seq_to_*` (@gasche, ocaml-community/yojson#131). ### Fix - Avoid copying unnecessarily large amounts of strings when parsing (ocaml-community/yojson#85, ocaml-community/yojson#108, @Leonidas-from-XIV) - Fix `stream_to_file` (ocaml-community/yojson#133, @tcoopman and @gasche)
The handling of
?buf
parameters today makes little sense:to_string
and allstream_*
functions clear the buffer beforeusing it (so: they guarantee that they work in the same way no
matter what the input state of the buffer is, and they make no
guarantees on its output state)
but
to_channel
andto_output
make weird claims about it in thedocumentation, that don't make sense to me, and appear to not clear
the buffer and in fact include, in their output, the content of the
buffer as it is passed.
I believe that the reason for this difference comes from the previous
codebase using Biniou, that was replaced by Buffer in
9310500
The change left the usage of ?buf in
to_channel
andto_output
rather incoherent (and I think broke the documentation somewhat), due
to different usage properties of Biniou_outbuf, who "owns" the
underlying output channel and implements its own buffering, and
Buffer, which is not related to an output channel: the previous API
may have made sense for Biniou, but it does not anymore.
What's not completely clear to me is whether some code in the wild
could rely on the previous, nonsensical behaviour. This may happen if
the code was written with Biniou in the past, and converted to use
Buffer in the same systematic (and slightly wrong) way as Yojson
itself.
(Note: there are two independent questions: (1) whether the buffer is
cleared on entry, and (2) whether the buffer is cleared on
exit. Currently for
to_channel
andto_output
neither is done, andat least for (2) this seems in direct contradiction with what the
documentation says -- at least with the natural interpretation of "buf
is flushed" as "the content is sent to the output channel, and the
buffer is cleared". So the lack of (2) is clearly a bug, and the lack
of (1) might not be.)