Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSONPath API #992

Closed
robingustafsson opened this issue Apr 8, 2019 · 9 comments
Closed

JSONPath API #992

robingustafsson opened this issue Apr 8, 2019 · 9 comments
Labels
evaluation needed proposal needs to be validated or tested before fully implementing it in k6 feature

Comments

@robingustafsson
Copy link
Member

k6 currently has the http.Response.json(selector) API to extract individual pieces of a JSON structure. This "selector" API is not JSONPath, but uses a similar syntax and is based on the GJSON Go library.

I propose we switch this API to be JSONPath compatible using a library like [1] or [2].

We should also expose the JSONPath API as a standalone API, not tied to the HTTP response object, with an API like [3]:

import jsonpath from "k6/jsonpath";

// Take a string of JSON, parse it and then run the query
let name = jsonpath.query('{"name": "john doe"}', "$.name");

// Take a JS structure and run a query on it
const names = [
    {"name": "john doe"},
    {"name": "sven svensson"}
];
let name = jsonpath.query(name, "$.[1].name");

The API should support the full JSONPath syntax, including filter expressions. The following APIs should be implemented:

  • jsonpath.query(objOrStr, pathExpression[, count])

[1] - https://github.com/oliveagle/jsonpath
[2] - https://github.com/PaesslerAG/jsonpath
[3] - https://www.npmjs.com/package/jsonpath

@na--
Copy link
Member

na-- commented Apr 9, 2019

I'll add an evaluation needed tag here, for a few reasons:

As far as I can see, JSONPath isn't an actual standard, it's just described as a post on some guy's blog in 2007. XPath is way more legit - it has RFCs, a wikipedia page, 3 different standardized versions, etc... From the 3 libraries you linked above, the first one's README says that it follows "the majority rules in http://goessner.net/articles/JsonPath/ but also with some minor differences" and the last one has a "Differences from Original Implementation" section. The middle one looks robust, but its readme is just a few sentences, so I can't know how strictly it follows the "standard". I'm mentioning this since it's not immediately clear to me what the benefits over the currently used GJSON library are.

The second reason I want to further evaluate this before we implement it is that it will be available as a standalone functionality, not just for HTTP responses. Thus, when implementing it, we'll likely want to do it with support for execution segments in mind, so that users would be able to operate on large JSON files in a distributed manner. Consequently, the JSONPath implementation we'll use would have to support streaming JSON parsing, i.e. being able to work and resolve queries without having the whole JSON file parsed or in memory at once.

Neither GJSON, nor the two libraries you linked seem to support this... The Go standard library handles streaming via json.Decoder, but all of these libraries require that you have a full JSON document - either as a string, or as a Go interface{} object... This seems like what we'd need, but it's either abandoned or dead...

(I still haven't created an actual github issue for execution segments from that wall of text in the Slack chat, I'll link it here when I do so)

@na-- na-- added the evaluation needed proposal needs to be validated or tested before fully implementing it in k6 label Apr 9, 2019
@na--
Copy link
Member

na-- commented Apr 9, 2019

Here's another Go library that, although not exactly streaming, seems to support jsonpath (like the two @robingustafsson linked), but requires only []byte (like GJSON): https://github.com/bhmj/jsonslice
Even though this isn't exactly a streaming parser, it will likely be easier to adapt to our use case than one of the libraries that require already parsed objects.

btw it also has a Limitations and deviations section in the README 😄

@robingustafsson
Copy link
Member Author

Right, JSONPath isn't a formalized standard 🙂

Regarding the need for streaming, I'm thinking that's a later iteration. So agree, we need to take that future direction into account, but I'd argue we don't need it from the start. #532 is more important to implement IMO, to make sure we only load the file once into memory.

With "execution segments" are you thinking we should implement support for data partitioning in a first iteration or just keep it in mind for future extendability as with streaming?

@na--
Copy link
Member

na-- commented Apr 9, 2019

With "execution segments" are you thinking we should implement support for data partitioning in a first iteration or just keep it in mind for future extendability as with streaming?

Seems like I have to convert that slack wall-of-text to an actual execution segments github issue sooner rather than later 😄 To answer the question: no, data partitioning will come as a second iteration, simply because execution segment support in the new schedulers will be done very soon (™® 😊 ), while we haven't yet started working on shared memory/data partitioning/data "streaming"/etc.

My desire to use a streaming JSON parser in the beginning is mainly due to the fact that all of those JSONPath parsers are slightly different, each with their own quirks and "limitations and deviations" from the original "spec". So if we start using one of those in the beginning and we have to switch to another when we add data partitioning, we'd be introducing unnecessary backwards incompatibility.

And during my work on k6, I've developed an aversion to writing code that I know we're going to throw away in the future. It will require more effort initially, but it will also save us from from the annoyance of spending tons of time refactoring these things and working around old user-facing APIs before every new feature that we want to add...

Regarding the need for streaming, I'm thinking that's a later iteration. So agree, we need to take that future direction into account, but I'd argue we don't need it from the start. #532 is more important to implement IMO, to make sure we only load the file once into memory.

Agree, though I'm hesitant to go ahead with #532, knowing the limitations of the approach and its UX problems when a load test is executed on multiple k6 instances. If we already "know" a way to avoid those issues and get the same benefits, regardless if the test is executed on one or more machines (i.e. execution segments), shouldn't we proceed directly with that?

If we go ahead with #532 without execution segments, it will take shorter, but when we decide to add proper data partitioning, it will take much longer since we'll have to work around the old code, the API will likely be inconsistent, and there might be breaking changes...

@mstoykov
Copy link
Contributor

mstoykov commented Apr 9, 2019

I too agree that this should be done after we evaluate the libraries in question and we should get the streaming working from the get go. As I would not like to explain to everybody why around jsonpath which was different from all others jsonpaths previously is now different from itself as well ...

👍 to the posting of the execution segments proposal

I would also would like to see something like https://github.com/dchester/jsonpath tested first and benchmarked it against the future k6 internal implementation (This also have subtle differences but reading through them all of them seem to be okay and completely sane and for the best)

new schedulers will be done very soon (™® blush ),

🤣

@robingustafsson
Copy link
Member Author

👍 ok, let's evaluate our options in regards to streaming, data partitioning and #532 before we start implementing this.

@na--
Copy link
Member

na-- commented Apr 9, 2019

Added the execution segments mega-issue: #997

@na--
Copy link
Member

na-- commented May 16, 2019

@robingustafsson, here I've described some of the issues that we need to solve before we implement the JSONPath feature: #1021 (comment)

I think all of those apply to the JSONPath implementation as well, plus the fact that with JSONPath we also have data filtering and counting, operations that also have to be considered through the prism of segmentation and data sharing and partitioning.

@oleiade
Copy link
Member

oleiade commented Dec 4, 2023

We discussed the topic with the maintenance team as part of our effort to make our backlog of issues more consistent.

There is no apparent demand to switch to JSONAPI, and as k6 already supports GJSON, we've decided to close this issue. We might come back to it later if there's demand for it, or if the team judges the switch would serve a better developer-experience.

@oleiade oleiade closed this as completed Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
evaluation needed proposal needs to be validated or tested before fully implementing it in k6 feature
Projects
None yet
Development

No branches or pull requests

4 participants