-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Behavior of operator>> should more closely resemble that of built-in overloads. #367
Comments
The lines
was executed with About the inputs:
I can understand your thoughts about the behavior of the streams. However, I think it is not a good idea to stop parsing once we found a JSON value, because this would mean that code like json j;
j << "false foo bar"; would run without exception. Am I too strict about this? |
An empty input is not a valid integer either, yet cin>>i will not throw - it will consume the blanks and leave i unchanged. There are two classes of use-cases:
Think of a what kind of code a new user of the library would write, assuming they have not read any documentation other some examples, when they try to read a bunch of json values from stdin in a loop. Now make the API work that way, following the "design principle of least astonishment" : ) |
I understand your point and you answered my "Am I too strict about this?" question with "yes" 😄 It would be nice to hear other opinions about this. I still feel uncomfortable accepting non-valid JSON files by just looking at a prefix... |
Seems reasonable for streaming to stop after a JSON document has been read. If this is a behavior change, it might require a major version bump. Note that both I agree that this seems weird, but I'm not sure it's valid right now, since there is no operator<< that takes a string:
|
Note for the future: |
Allowing to partially parse a stream would be very useful. In fact, that's what I'm struggling with right now. I get input from a WebSocket, and it's basically a stream of non-delimited JSON objects, e.g. json msg;
while (iss >> msg) {
// ...
} Right now I can't see any way to parse this kind of stream. |
+1, this is basically the only thing I wish for here. Is there any way to simulate this behavior with the library as it stands now? |
Unfortunately not. |
I saw someone do it in a way that required a two-stage parse. They parsed the data once, got the position of the parse error, assumed that was the start of the next whole entry, and so parsed again limiting to just before the parse error, and then moving the start position to the error position for the next entry. |
That's exactly what I did some time ago. Try to parse, catch the exception, save the offset, limit stream to that many characters, parse, repeat. I think that required the |
This is currently the problem in the preparation of 3.0.0 - a lot of API-changing features depend on each other, and I want to try to avoid releasing a 4.0.0 and 5.0.0 in a short period. |
@krzysztofwos Yeah, I think it was you that I was thinking of, but I couldn't find the post on slack, I think it's beyond the 10K limit. |
I just resorted to signaling the end of a JSON message with |
If you're in control of both producing and consuming json records, another approach is to dump in single-sine mode, and, when consuming, first fetch the json document from stream with |
The current parser relies on reading input streams with |
Will be support added for parsing multiple JSON objects from strings too? |
@szikra That's exactly what this issue is about. operator>> uses parse(). Although I suppose it's not strictly part of what's needed to support this, as there would need to be some indication of how much of the string was used. |
@gregmarr Thanks. I'm sure my And yes, it would be enough if we had a public method that returns the length parse() read the last time. class J{
public:
struct C {
J* j;
int len;
operator J&() {
return *j;
}
};
C parse() {
return C {this, 123};
}
}; RVO would help here to discard length if we are not using it, so it wouldn't have to be stored as last processed amount inside json, right? |
Yeah, that first comment isn't terribly useful. The next one is better, or the one linked in a comment: |
Yeah, I don't want to break existing code. I also hate returning data with pointer arguments, so ugly. :) class J{
public:
struct C {
J* j;
int len;
operator J&() {
return *j;
}
};
C parseWithLen() {
return C {this, 123};
}
J& parse() {
return parseWithLen();
}
/// But if people really want, this could also be added as an option:
J& parse(size_t& len) {
C c = parseWithLen();
len = c.len;
return *c.j;
}
J& parse(size_t* len) {
C c = parseWithLen();
if (len) *len = c.len;
return *c.j;
}
}; |
There are some subtleties to this. |
@TurpentineDistillery With the fix for #452 the library would indeed reject |
@nlohmann |
I did the restructuring I mentioned in #367 (comment). I did not find the time to test this issue again though. |
The output for the example above (#367 (comment)) is now
This seems reasonable to me. |
There was still a bug left, but now the examples from @TurpentineDistillery and @ceztko behave as expected. I added them as regression tests, see 9e507df. If the CI build succeeds, I think I can close this issue. |
Hello, I just took the test case I posted here and executed it verbatim on VS2017. Please note that the test is saving the stream to a file and after reading from it. The result is this:
After the first read the stream is actually postioned correctly after the first "}", but it's failing to read the second element. If I create a stringstream with exactly the same content with
Which means it correctly parsed the second element, without exceptions. The stream is positioned correctly after the first "}" on the first read and on EOF on the second read. Behavior may be compiler/library dependent but I am expecting the same result to happen on the file stream. Question: are you parsing the stream character by character or are you doing some kind of caching? |
@ceztko I also realized different behavior in MSVC as the tests on AppVeyor fail (https://ci.appveyor.com/project/nlohmann/json/build/1965). I haven't had the chance to check the logs though. About caching: yes, I read 16 KB from an input stream into a cache. When the parser is destructed, I call |
It's perfectly reasonable to cache some amount of bytes in advance when reading from any stream. But I am not expecting the seekg to work reliably on any kind of stream (for example: if the stream is a socket). The idea should work on filestream, though, so there may be some bug. |
In may add special overloads for ifstream and istringstream that use caches and remove caching for general istream cases. Would this work? |
For me it's also a workable solution. You may also consider having the caching only on ifstream since istringstream it's already on memory. |
@ceztko I openend a question at StackOverflow and found out the code I used to "rewind" the stream was not correct. I fixed it in a feature branch and the example from #367 (comment) is now working with MSVC 2015 and MSVC 2017 (see AppVeyor build). |
@nlohmann great news, thanks! Sorry for not having looked at this: I wanted to debug it but I couldn't find the time. |
Merged fd4a0ec which fixes this issue. Thanks everybody for the patience. |
Basically:
The text was updated successfully, but these errors were encountered: