Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about flatten and unflatten #1989

Closed
dota17 opened this issue Mar 17, 2020 · 14 comments
Closed

Question about flatten and unflatten #1989

dota17 opened this issue Mar 17, 2020 · 14 comments
Labels
kind: question state: help needed the issue needs help to proceed state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated

Comments

@dota17
Copy link
Contributor

dota17 commented Mar 17, 2020

json j  = R"({
        "bad": {
            "0": "one",
            "1": "two",
            "2": "three"
        },
        "foo": "bar"
    })"_json;
json j_flatten = {
        {"/bad/0", "one"},
       {"/bad/1", "two"},
       {"/bad/2", "three"},
       {"/foo",   "bar"}
};

I had tested that j.flatten() == j_flatten is true and j_flatten.unflatten() == j is false.
From the source code, it might just look at the first pointer - if there is a "0" reference_token, it will start a new array. And the behavior will cause that under the following cases, we cannot use unflatten():

    json j_flatten =
            {
                    {"/bad/0", "one"},
                    {"/bad/1", "two"},
                    {"/bad/t", "three"},
                    {"/foo",   "bar"}
            };

when calling unflatten(), [json.exception.parse_error.109] parse error: array index 't' is not a number will be raised.

@dota17
Copy link
Contributor Author

dota17 commented Mar 17, 2020

Related Issue : #1575
But it seems that there is no exact solutions..

IMHO, we can change the array to object when we get [json.exception.parse_error.109] parse error: array index 't' is not a number.
Or in 'unflatten()', just consider object because array in json can be represented as object.

@nlohmann
Copy link
Owner

Yes, there is no exact solution for this. The point you are describing (object keys which can be translated into integers) are ambiguous in this setting. The problem with your described approach is that it would require multi-pass parsing. one to collect all keys for all object/arrays, and then to create the final types. I am not sure whether this would fix the issue. But maybe I miss something.

@dota17
Copy link
Contributor Author

dota17 commented Mar 23, 2020

object keys which can be translated into integers

json j_flatten =
            {
                    {"/bad/0", "one"},
                    {"/bad/1", "two"},
                    {"/bad/t", "three"},
                    {"/foo",   "bar"}
            };

In this case, integers can be object keys.
What i want to say is that when we get a 0, it maybe not an array.
Of course, if all keys are integers, array can be our first choice.
But if not all integers, it should not raise a exception. Maybe we can try object.

@dota17
Copy link
Contributor Author

dota17 commented Mar 23, 2020

I had tried to fix this and got some troubles.

  • Trouble 1: How to translate array to object and free the memory of array?
  • Trouble 2: The nullvalue probleam between 0 and 11, this is the example:
json j_flatten =
            {
                    {"/bad/0", "one"},
                    {"/bad/11", "two"},
                    {"/bad/t", "three"},
                    {"/foo",   "bar"}
            };

Anyway, fill up array with null values if given idx is outside range is not a good choice for this case.

@nlohmann
Copy link
Owner

All in all, I fear there is no proper solution for this due to the ambiguity that JSON Pointers have in this regard.

@dota17
Copy link
Contributor Author

dota17 commented Mar 25, 2020

From #1575 (comment)

As the flatten and in particular the unflatten function are not really standardized, one could think about adding a parameter to not interpret numbers as strings.

I think we can add a parameter into unflatten() to decide which we will consider first, array or object.

My idea is that in 'unflatten()', we just consider object as our default choice, but we also provide a method that people can call it to translate the object items to array as much as possible.
The method can be applied under other cases, not just unflatten.

// original behavior, the probleam still exist.
json.unflatten();
json.unflatten(object = false);

// everything is  object
json_unflatten = json.unflatten(object = true);

// provide a method to translate the object items to array as much as possible
json_unflatten.translateMethod();

Example:

json = {
    "/bad/0": "one",
    "/bad/1": "two",
    "/bad/t": "three",
    "/foo/0": "one",
    "/foo/1": "two",
    "/foo/2": "three"
}
// json_unflatten = json.unflatten(object = true);
json_unflatten = {
    "bad": {
        "0": "one",
        "1": "two",
        "t": "three"
    },
    "foo": {
        "0": "one",
        "1": "two",
        "2": "three"
    }
}
// json_unflatten.translateMethod();
{
    "bad": {
        "0": "one",
        "1": "two",
        "t": "three"
    },
    "foo": ["one", "two", "three"]
}

@dota17
Copy link
Contributor Author

dota17 commented Apr 8, 2020

Keep active.
Does anyone have new ideas?

@nlohmann
Copy link
Owner

The problem is as follows: once value 0 is handled, an array is created. A solution would be to sort the keys such that first all non-digit keys are treated; then an object would be created if such keys exist. Only then the digit keys are processed - if we already process an object, then they are treated as string keys. If not, then we would start an array and knew that only digits keys would follow.

The problem is to sort a list of JSON pointers in that setting, because the choice whether to start an object or array has to be made after every /.

For the example, the order should be

"/bad/t"
"/bad/0"
"/bad/1"
"/foo/0"
"/foo/1"
"/foo/2"

@nlohmann nlohmann added the state: help needed the issue needs help to proceed label Apr 13, 2020
@dota17
Copy link
Contributor Author

dota17 commented May 9, 2020

A solution would be to sort the keys such that first all non-digit keys are treated; then an object would be created if such keys exist.

I had tried to solve it. But the above troble 2 still exist.

Trouble 2: The nullvalue probleam between 0 and 11, this is the example:
json j_flatten =
{
"/bad/0": "one",
"/bad/11": "two",
"/foo": "bar"
};

IMO, i think it is ok that in 'unflatten()', we just consider object. I prefer to use object to represent array instead of filling with null values.

@dota17
Copy link
Contributor Author

dota17 commented May 9, 2020

Just considering object in unflatten brings the following problem:

json j = [0,1,2,3]
json j_flatten = 
{
    "/0": 0,
    "/1": 1,
    "/2": 2,
    "/3": 3,
}
json j_flatten_unflatten = 
{
    "0": 0,
    "1": 1,
    "2": 2,
    "3": 3
}

But i think this can be accepted.

@nlohmann
Copy link
Owner

nlohmann commented May 9, 2020

Now I think I understand what you mean with "the null value" problem.

Considering the JSON value

{
    "/bad/0": "one",
    "/bad/11": "two",
    "/foo": "bar"
}

I think the only unflattened value that makes sense is

{
    "bad": {
        "0": "one",
        "11": "two"
    },
    "foo": "bar"
}

because though "0" and "11" could be valid indices for an array, flattening would list all values. This means

{
    "bad": ["one", null, null, null, null, null, null, null, null, null, null, "two"],
    "foo": "bar"
}

can not be the unflattened value for the JSON above.

This means we must not just create an array if we encounter a number, but also make sure that all numbers are continuous.

I have the feeling that the effort to fix this is not really worth it... Any ideas?

@dota17
Copy link
Contributor Author

dota17 commented May 11, 2020

My idea is that in unflatten we just consider creating object, which means when we meet continuous numbers, we also unflatten to an object to represent the actual array. The advantage of this idea is that it will basically not throw an exception when we use flatten and unflatten. The only bad thing is that the array-type value will be changed to object-type value after flatten and unflatten. It is acceptable.

@dota17
Copy link
Contributor Author

dota17 commented May 11, 2020

Or based on the idea above, we can try to convert those object with continuous numbers key to array after calling get_and_create in unflatten method to solve the problem - The only bad thing is that the array-type value will be changed to object-type value after flatten and unflatten. .

@stale
Copy link

stale bot commented Jun 10, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Jun 10, 2020
@stale stale bot closed this as completed Jun 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: question state: help needed the issue needs help to proceed state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated
Projects
None yet
Development

No branches or pull requests

2 participants