Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(#887): complete strictMode for JSONArray #888

Merged
merged 24 commits into from
May 21, 2024
Merged

Conversation

rikkarth
Copy link
Contributor

@rikkarth rikkarth commented Apr 14, 2024

Description

If in strict mode, and having reached the end of the JSONArray construction life-cycle with no errors, we then perform an input validation.
image

In this validation we explicitly check if (1.) the last bracket was found and (2.) if more characters exist after that.

If the next character is not the end of the array (0), then it means we have more characters after the final closing square bracket, so we throw an error message informing the array is not compliant and we give back the complete input to the user for easy debugging.

image

To ensure this logic doesn't affect 2D arrays I also created a new test for it.
image

From the examples provided in the issue created: #887

Additional Note

Since the logic is fully encapsulated inside validateInput() which is only run if strictMode is true, it is guaranteed that previous implementations are not affected when strictMode is false and that this implementation is modular and can be maintained, improved or removed by simply removing validateInput().

Test cases added

image

image

@rikkarth rikkarth marked this pull request as ready for review April 14, 2024 22:17
@stleary
Copy link
Owner

stleary commented Apr 19, 2024

@rikkarth Sorry for taking so long to complete the review. Comments added.

@rikkarth
Copy link
Contributor Author

rikkarth commented Apr 21, 2024

  1. Fixed issue with double array breaking JSONTokener.nextValue() which now forwards jsonParserConfiguration object when creating a new JSONArray.
    image

  2. Changed input validation in JSONArray, removing collection of complete input in favor of leveraging x.syntaxError().
    image

image

@stleary
Copy link
Owner

stleary commented Apr 21, 2024

@rikkarth Please merge from master to pick up the latest changes:
image

@stleary
Copy link
Owner

stleary commented Apr 23, 2024

@rikkarth Looks like there is a regression when parsing array elements with non-string simple values. Please add this unit test:

    @Test
    public void shouldHandleNumericArray() {
        String expected = "[10]";
        JSONArray jsonArray = new JSONArray(expected, new JSONParserConfiguration().withStrictMode(true));
        assertEquals(expected, jsonArray.toString());
    }

This test passes when strict mode is false, but fails with this output when strict mode is true:
image

Similar problems occur in strict mode when the array elements are true/false, and null.

@rikkarth
Copy link
Contributor Author

Also boolean values are producing the same behaviour, I will fix it.
image

@rikkarth
Copy link
Contributor Author

rikkarth commented Apr 23, 2024

I changed how I validated numbers, boolean and string values using JSONObject.stringToValue existing logic, I've also added what I believe a few, more robust, test cases to prevent more regressions.

I hope the implementation looks cleaner now.

Sorry I didn't catch this earlier.

Correction - regression when parsing array elements with non-string simple values.

    /**
     * Parses unquoted text from the JSON input. This could be the values true, false, or null, or it can be a number.
     * Non-standard forms are also accepted. Characters are accumulated until the end of the text or a formatting
     * character is reached.
     *
     * @param c The starting character.
     * @return The parsed object.
     * @throws JSONException If the parsed string is empty.
     */
    private Object parsedUnquotedText(char c, boolean strictMode) {
        StringBuilder sb = new StringBuilder();
        while (c >= ' ' && ",:]}/\\\"[{;=#".indexOf(c) < 0) {
            sb.append(c);
            c = this.next();
        }
        if (!this.eof) {
            this.back();
        }

        String string = sb.toString().trim();

        if (string.isEmpty()) {
            throw this.syntaxError("Missing value");
        }

        Object stringToValue = JSONObject.stringToValue(string);

        return strictMode ? getValidNumberOrBooleanFromObject(stringToValue) : stringToValue;
    }

    private Object getValidNumberOrBooleanFromObject(Object value) {
        if (value instanceof Number || value instanceof Boolean) {
            return value;
        }

        throw new JSONException(String.format("Value is not surrounded by quotes: %s", value));
    }

@stleary
Copy link
Owner

stleary commented Apr 23, 2024

null without quotes is a valid JSON value, just like true and false (see https://json.org). Strict mode should allow this value when it appears without quotes. Please add this test case for coverage:

    @Test
    public void allowNullInStrictMode() {
        String expected = "[null]";
        JSONArray jsonArray = new JSONArray(expected, new JSONParserConfiguration().withStrictMode(true));
        assertEquals(expected, jsonArray.toString());
    }

@rikkarth
Copy link
Contributor Author

rikkarth commented Apr 23, 2024

Done as requested. Now it was simple as adding it to the this method:

    private Object getValidNumberBooleanOrNullFromObject(Object value) {
        if (value instanceof Number || value instanceof Boolean || value.equals(JSONObject.NULL)) {
            return value;
        }

        throw new JSONException(String.format("Value is not surrounded by quotes: %s", value));
    }

@stleary
Copy link
Owner

stleary commented Apr 25, 2024

@rikkarth, This string fails in strict mode because there are chars after the end of an empty array:
String s = "[[]]";
This string is allowed in strict mode because the chars at the end are not validated:
String s = "[1],";
You might be able to address the first problem by making the parser code block starting at line 160 exactly the same as the code block starting at line 134, as I mentioned sometime back. But it will probably still need to be reworked to handle values after the closing bracket, like a closing JSONObject brace. The second problem means that the parser is still not reliably detecting invalid trailing chars. If the current implementation is unable to do that, it might be necessary to revisit the design for this feature.

@rikkarth
Copy link
Contributor Author

@rikkarth, This string fails in strict mode because there are chars after the end of an empty array: String s = "[[]]"; This string is allowed in strict mode because the chars at the end are not validated: String s = "[1],"; You might be able to address the first problem by making the parser code block starting at line 160 exactly the same as the code block starting at line 134, as I mentioned sometime back. But it will probably still need to be reworked to handle values after the closing bracket, like a closing JSONObject brace. The second problem means that the parser is still not reliably detecting invalid trailing chars. If the current implementation is unable to do that, it might be necessary to revisit the design for this feature.

Yes you are absolutely right. I will come back to you with a more solid, robust solution and design.

Thank you for your patience.

@stleary
Copy link
Owner

stleary commented Apr 28, 2024

Branch strict-mode-unit-tests contains all of your code from this branch. In the JSONParserConfiguration default constructor, I forced strictMode to be true, and then modified the broken unit tests or replaced @test with @ignore.

  • Most of the broken tests were due to inadvertently forgetting to use double quotes in otherwise valid JSON docs. You can ignore those changes.
  • A few tests were set to @ignore because they intentionally used unquoted text. They will be restored later, and are not a concern.
  • There were a few grey areas, e.g. when an invalid numeric value is supposed to be converted to string, but instead throws an exception. It could be argued either way - should strictMode allow the conversion or not? No decision has been made yet, no action needed at this time.
  • There is a regression in single quote handling within double-quote delimited strings that must be fixed in strictMode.
  • There are also several errors in JSONTokenerTest that suggest a regression.

You can fix these issues on your current branch, or you can take a new fork off of the strict-mode-unit-tests branch.

@rikkarth
Copy link
Contributor Author

rikkarth commented Apr 28, 2024

I will review all this points and will come back to you with new feedback.

At first glance, this should be relatively easy to achieve/fix.

Thank you for this analysis.

@rikkarth
Copy link
Contributor Author

rikkarth commented Apr 28, 2024

@stleary

Update:
I fixed all the test cases in JSONTokener apart from the last which I require more feedback:

image

The Bug

The issue was, I didn't perform the most simple of validations which checks for end of file after finding ']' at 2nd, which I included now. After doing it, all tests pass accordingly.
image

As for this test case

@Test
public void testValid() {
    // ...
    checkValid("1 2", String.class);
}

throws JSONException which is indeed valid, since you manually set strictMode true in JSONParserConfiguration.

The test case value is not inside an array;
The test case value is not inside quotes;
The test case value is not valid according to strictMode guidelines (or is it?)

What should I do with this case?

org.json.JSONException: Value '1 2' is not surrounded by quotes at 3 [character 4 line 1]

@rikkarth
Copy link
Contributor Author

rikkarth commented Apr 28, 2024

This test is failing because the expected testCase is actually malformed, strictMode working accordingly. Not a regression.
Test name: toJSONObjectToJSONArray
image
image

image

inline
image

I'm not sure if you'd like me to keep the tests like they currently are (even if malformed).

In the mean time I will isolate the tests as requested. I will correct and externalize the test cases into JSON files which I believe will be more compliant since modern IDEs check for syntax errors for us in cases like this.

- JSONArray now evaluates EOF accordingly for empty Array inputs.
- JSONTokener fixed indentation
- externalized two JSONMLTest cases
@rikkarth
Copy link
Contributor Author

Branch strict-mode-unit-tests contains all of your code from this branch. In the JSONParserConfiguration default constructor, I forced strictMode to be true, and then modified the broken unit tests or replaced @test with @ignore.

All of the points you've highlighted were fixed. There are a few cases (e.g checkValid("1 2", String.class);) where I don't exactly know what I should do there, and will require some more feedback on this.

I removed the TODO's, comments, etc that you provided in strict-mode-unit-tests as I fixed each issue.

I left some of the "important" TODO's the one's you wrote "TBD later" or "will revert later" so you could easily navigate back to them later.

There is a regression in single quote handling within double-quote delimited strings that must be fixed in strictMode.

Fixed.

There are also several errors in JSONTokenerTest that suggest a regression.

Fixed.

You can fix these issues on your current branch, or you can take a new fork off of the strict-mode-unit-tests branch.

I merged them into this branch to keep a linear history of what was affected in this PR.

@rikkarth
Copy link
Contributor Author

I added this TODO's because I strongly believe the test cases here are super bloated and could be hugely simplified.

I will make my case in a future PR, leaving this here for reference.
image

@stleary
Copy link
Owner

stleary commented Apr 29, 2024

@rikkarth Nice work.
The 1 2 test result signals a regression - in this case, a change in the behavior of a public API, which is always a concern. Might be good to see if the existing behavior can be retained.
Will review the code, and add some comments.

@rikkarth
Copy link
Contributor Author

rikkarth commented Apr 29, 2024

@rikkarth Nice work. The 1 2 test result signals a regression - in this case, a change in the behavior of a public API, which is always a concern. Might be good to see if the existing behavior can be retained. Will review the code, and add some comments.

The test for this use-case passes in strictMode=false just not in strictMode=true which follows JSON rules.

The tokener is trying to understand if the value is enclosed within an array or object.
image

And then the tokener in strictMode sees this value is not a Number, null or boolean ... it's a string without quotes.
image

The question is: Why should this test pass in strictMode? The only answer I could get was - it shouldn't.

The return when strictMode=false.

checkValid("1 2", String.class); //strictMode false

output

1 2

@rikkarth
Copy link
Contributor Author

rikkarth commented May 2, 2024

Could the label for this PR be updated? It's no longer pending redesign but pending review.

Thank you.

@stleary
Copy link
Owner

stleary commented May 2, 2024

Could the label for this PR be updated? It's no longer pending redesign but pending review.

I think the label still applies. The goal of issue #887 was just to catch trailing chars at the end of the JSON doc. There might be a simpler solution that hasn't been found yet. For example, adding properties to the tokener does not seem like the right direction, or the way that JSONArray() was rewritten.

@rikkarth
Copy link
Contributor Author

rikkarth commented May 2, 2024

Fair enough.

Did you have time to read my previous comment?

It is the last point I have to complete the points you've defined previously.

I can make this last test pass, I'm just wondering why should this test pass in strict mode?

Do you have an opinion on it?

@stleary
Copy link
Owner

stleary commented May 2, 2024

Did you have time to read my previous comment?
It is the last point I have to complete the points you've defined previously.
I can make this last test pass, I'm just wondering why should this test pass in strict mode?
Do you have an opinion on it?

Do you mean the "1 2" validity test? Do you think the result difference is due to code changes to enforce quote-delimited strings, or to detecting trailing chars? In the former case, no additional changes are needed. In the latter case, it should be handled by the redesign. In either case, you don't need to make it work now.

@rikkarth
Copy link
Contributor Author

rikkarth commented May 3, 2024

Did you have time to read my previous comment?
It is the last point I have to complete the points you've defined previously.
I can make this last test pass, I'm just wondering why should this test pass in strict mode?
Do you have an opinion on it?

Do you mean the "1 2" validity test? Do you think the result difference is due to code changes to enforce quote-delimited strings, or to detecting trailing chars? In the former case, no additional changes are needed. In the latter case, it should be handled by the redesign. In either case, you don't need to make it work now.

Yes, the error is exactly that. Quote-delimited strings, which is coherent with the design of strictMode.

If you change the test and put quotes around the string, while strictMode is true, then it passes the test.

Please check screenshots bellow.

image

image

@stleary
Copy link
Owner

stleary commented May 17, 2024

What problem does this code solve?
Complete the implementation of strict mode for JSONArray

Does the code still compile with Java6?
Yes

Risks
Low

Changes to the API?
No

Will this require a new release?
No

Should the documentation be updated?
No

Does it break the unit tests?
No, new unit tests were added

Was any code refactored in this commit?
No

Review status
APPROVED

Starting 3-day comment window

@rikkarth
Copy link
Contributor Author

Cleaned up strictMode true flag as instructed.

@stleary stleary merged commit 14f7127 into stleary:master May 21, 2024
7 checks passed
stleary pushed a commit that referenced this pull request Nov 3, 2024
This reverts commit 14f7127, reversing
changes made to 054786e.
@stleary stleary mentioned this pull request Nov 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants