Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incremental parsing with sax_parser #2030

Closed
AlexandreBossard opened this issue Apr 6, 2020 · 9 comments
Closed

incremental parsing with sax_parser #2030

AlexandreBossard opened this issue Apr 6, 2020 · 9 comments
Labels
kind: question state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated

Comments

@AlexandreBossard
Copy link

AlexandreBossard commented Apr 6, 2020

I'm reading (possibly big) json data from the network. I'm using boost:asio for that. I would have expected something of the like:

json::json_sax_t* sax_parser_ = new...;
for (auto it, net::buffer_sequence_begin(buffers); it != net::buffer_sequence_end(buffers); ++it)
{
   net::const_buffer buffer = *it;
   nlohmann::json::sax_parse(
       buffer.data(), buffer.data() + buffer().size(), sax_parser_);
}
return sax_parser_.json();

An asio read() returns a Sequence of buffer(an container of char*). So I need to call json::sax_parse multiple times.

I have not found a default sax_parser implementation. One that just build a json incrementatlly with consecutive calls to sax_parse()

Do I have to write my own for such (I believe) a simple task ?

Actually, I exactly want to do this #605 (comment)

@nlohmann
Copy link
Owner

nlohmann commented Apr 6, 2020

The parser accepts an iterator range as input - as long as you can wrap your input in a type with a begin() and end() function, that should work. The default parser is here: https://github.com/nlohmann/json/blob/develop/include/nlohmann/detail/input/json_sax.hpp#L145

@AlexandreBossard
Copy link
Author

Why is it in a detail namespace ? Is it safe to use regarding api stability ?

@nlohmann
Copy link
Owner

It is in the detail namespace as this is the parser used by the library by default. You can provide your own input adapter and pass it to the parse function.

@AlexandreBossard
Copy link
Author

I was more concern about api stability (will it break ?) and why it's not documented. Re-Implementing a sax parser seems unnecessary if there is a perfectly good one for incremental parsing.

@nlohmann
Copy link
Owner

You do not need to reimplement it - sorry for the confusion. You only need to define an input adapter.

@stale
Copy link

stale bot commented May 13, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label May 13, 2020
@nlohmann nlohmann removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label May 13, 2020
@stale
Copy link

stale bot commented Jun 12, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Jun 12, 2020
@stale stale bot closed this as completed Jun 20, 2020
@jeljeli
Copy link

jeljeli commented Nov 7, 2024

@nlohmann : First of all, I'd like to thank you for your fantastic work on this library.

I'm reopening this discussion because I'm unsure about the behavior of sax_parse when it is called incrementally. Based on my tests, it seems to work only if each chunk provided to sax_parse is a well-formed JSON object. I'm wondering if this is the expected behavior, or if I might be missing something. In a streaming context, it's not always possible to control the content of each chunk.

Here’s an example to illustrate my point:
Example OK :

std::string json1 = R"({"name": "John", "age": 30, "city": "New York"})";  
std::string json2 = R"({"name": "Alice", "age": 20, "city": "Paris"})";  
  
MyJsonSaxHandler handler;  
nlohmann::json::sax_parse(json1, &handler);  
nlohmann::json::sax_parse(json2, &handler);  

Not working example :

std::string json1 = R"({"name": "John", "age": 30, "city": "New York"})";  
std::string json2_part1 = R"({"name": "Alice", "age": 20)";  
std::string json2_part2 = R"(, "city": "Paris"})";  
  
MyJsonSaxHandler handler;  
nlohmann::json::sax_parse(json1, &handler);  
nlohmann::json::sax_parse(json2_part1, &handler);  
nlohmann::json::sax_parse(json2_part2, &handler);

Displayed errors :

Parse error: [json.exception.parse_error.101] parse error at line 1, column 28: syntax error while parsing object - unexpected end of input; expected '}'  
Parse error: [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - unexpected ','; expected '[', '{', or a literal

Just as a comparison, when using the SAX API of libxml2, there is xmlParseChunk, which allows for passing XML data that is not necessarily well-formed in each chunk.

Is there any way to achieve similar functionality with the nlohmann/json library?

Thank you in advance!

@nlohmann
Copy link
Owner

No, this is not possible. Each call to sax_parse tries to parse a complete value. With the strict parameter you can relax the behavior to allow for "unparsed" input (e.g., parsing [1,2] 3 as [1, 2] ignoring the 3. However, you cannot resume parsing. The reason is that the state of the parser is reset when sax_parse returns. Subsequent calls start fresh (hence the second error that complains about the unexpected , - the parser "forgot" that it was parsing an object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: question state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated
Projects
None yet
Development

No branches or pull requests

3 participants