Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sink: Force JSON encoding for .json files #3934

Merged
merged 6 commits into from
Jun 4, 2021

Conversation

yhrn
Copy link
Contributor

@yhrn yhrn commented May 31, 2021

The main issue this PR addresses is when a .json file containing long strings is read by kpt fn source and then, potentially after passing through some functions, get written back to a .json file by kpt fn sink as invalid JSON because of newlines added to the long strings. This PR could maybe be seen as the followup mentioned in #2546.

As a side note, it looks a bit problematic to get this change into kpt because it's at kyaml v0.10.17 and v0.10.20 comes with breaking changes. I could easily address all but one that was inside kustomize v2.0.3 which gets pulled in transitively via kubectl. Not sure how to deal with that.

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
  • If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
  • Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 31, 2021
@k8s-ci-robot
Copy link
Contributor

Welcome @yhrn!

It looks like this is your first PR to kubernetes-sigs/kustomize 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/kustomize has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @yhrn. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot requested review from droot and justinsb May 31, 2021 06:12
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 31, 2021
Comment on lines 146 to 151
if !forceJSON {
str, err := rNode.String()
if err != nil {
return errors.Wrap(err)
}
useJSONEncoder = json.Valid([]byte(str))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this stuff is still relevant. Maybe it's useful for getting more nicely formatted JSON compatible output when there is no path annotation?

@yhrn
Copy link
Contributor Author

yhrn commented May 31, 2021

CLA signed

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 31, 2021
@yhrn
Copy link
Contributor Author

yhrn commented Jun 1, 2021

I just realized that JSON encoding an item will break multi doc output. This is really an old bug but it was hidden since the json.Valid check would almost always return false because the temporary annotations added won't be quoted at all and therefore, probably unintentionally, prevent JSON encoding in most non sink cases.

I'll update the PR to address this.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 1, 2021
@yhrn
Copy link
Contributor Author

yhrn commented Jun 1, 2021

Second attempt: the idea now is that JSON encoding should be used if and only if if there is only a single item, the item has a *.json valued path annotation and the it's not going to be wrapped in a YAML list. I believe this always makes sense and that the previous strategy of using a JSON encoder if the YAML encoding of the item was valid JSON was both confusing and broken:

  • Confusing because it would yield different results for items originally encoded in JSON, e.g. depending on the length of their string values.
  • Broken because if any item in a a list of multiple items was determined to be valid JSON then YAML document delimiters would be skipped (non-wrapping mode)

I have also manually validated that running an kpt fn source | kpt fn run | kpt fn sink pipeline on a mixed JSON/YAML directory with a patched version of kpt v0.39.2 yields the desired result.

@natasha41575
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 2, 2021
@natasha41575
Copy link
Contributor

Thanks for the PR! It looks like TestFormatFileOrDirectory_YamlExtFileWithJson is failing, please make sure it passes.

You can run make prow-presubmit-check locally to run all the tests.

@yhrn
Copy link
Contributor Author

yhrn commented Jun 2, 2021

@natasha41575 I can get that test to pass by adding back the "if the YAML encoding of the node is JSON compatible then use JSON encoder" logic. However, I'm questioning a bit if this makes sense or if the test should just be removed?

Why would you have JSON in a .yaml file and expect it to be formatted like JSON? Is that worth encoding most nodes twice for? And like I've described higher up, this check is also flaky, if you have a long string value it's going to fail.

@natasha41575
Copy link
Contributor

natasha41575 commented Jun 2, 2021

@monopole @Shell32-Natsu do you have any context on TestFormatFileOrDirectory_YamlExtFileWithJson and if it's necessary to keep the test?

@yhrn yhrn force-pushed the fix-json-sink branch from 95e8ec5 to 942f112 Compare June 3, 2021 06:31
@yhrn
Copy link
Contributor Author

yhrn commented Jun 3, 2021

I just pushed a commit removing the test we discussed but I can add it back along with the code to make it pass if that ends up being the decision. I just wanted to see if I can get the test check to pass here because locally it fails after running the "normal" test suite but I suspect it's unrelated to my changes:

On linux, and not on remote CI.  Running expensive tests.
make[1]: Entering directory '/home/mohrn/repo/github.com/kubernetes-sigs/kustomize'
make[1]: '/home/mohrn/go/bin/helmV3' is up to date.
make[1]: Leaving directory '/home/mohrn/repo/github.com/kubernetes-sigs/kustomize'
Removing kustomize HEAD
Example tests passed against HEAD.
./hack/testExamplesAgainstKustomize.sh [email protected]
Installing kustomize [email protected]
On linux, and not on remote CI.  Running expensive tests.
make[1]: Entering directory '/home/mohrn/repo/github.com/kubernetes-sigs/kustomize'
make[1]: '/home/mohrn/go/bin/helmV3' is up to date.
make[1]: Leaving directory '/home/mohrn/repo/github.com/kubernetes-sigs/kustomize'
----------------------------------------------------------------------
echo "Error @showBase (block #8 in testHelm) of /home/mohrn/repo/github.com/kubernetes-sigs/kustomize/examples/chart.md"

kustomizeIt base
----------------------------------------------------------------------

stdOut capture:
----------------------------------------------------------------------

----------------------------------------------------------------------

stdErr capture:
----------------------------------------------------------------------
Error: json: unknown field "includeCRDs"
----------------------------------------------------------------------
F0603 08:03:45.958211 2047080 main.go:45] exit status 1
goroutine 1 [running]:
github.com/golang/glog.stacks(0xc000254100, 0xc00000ad00, 0x38, 0xfa)
        /home/mohrn/go/pkg/mod/github.com/golang/[email protected]/glog.go:769 +0xb9
github.com/golang/glog.(*loggingT).output(0xb69980, 0xc000000003, 0xc00017ca10, 0x992ebf, 0x7, 0x2d, 0x0)
        /home/mohrn/go/pkg/mod/github.com/golang/[email protected]/glog.go:720 +0x3b3
github.com/golang/glog.(*loggingT).printDepth(0xb69980, 0x7fff00000003, 0x1, 0xc000157e70, 0x1, 0x1)
        /home/mohrn/go/pkg/mod/github.com/golang/[email protected]/glog.go:646 +0x12d
github.com/golang/glog.(*loggingT).print(...)
        /home/mohrn/go/pkg/mod/github.com/golang/[email protected]/glog.go:637
github.com/golang/glog.Fatal(...)
        /home/mohrn/go/pkg/mod/github.com/golang/[email protected]/glog.go:1128
main.trueMain(0xc000074520, 0x0, 0x0)
        /home/mohrn/go/pkg/mod/github.com/monopole/[email protected]/main.go:45 +0x5a6
main.main()
        /home/mohrn/go/pkg/mod/github.com/monopole/[email protected]/main.go:70 +0x46
make: *** [Makefile:280: test-examples-kustomize-against-4.1] Error 255

I don't understand what tests are running here and I also don't see how my changes would cause Error: json: unknown field "includeCRDs". The first line of the output I included here leads me to suspect that these tests are rarely run if most devs are on Mac.

"config.kubernetes.io/path": "test.json"
}
}
}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably add a test case for an unformatted valid json resulting in formatted valid json. This will sort of cover the test for the deleted test case (TestFormatFileOrDirectory_YamlExtFileWithJson which tests unformatted json to formatted yaml which doesn't makes sense).

@droot
Copy link
Contributor

droot commented Jun 3, 2021

@natasha41575 I can get that test to pass by adding back the "if the YAML encoding of the node is JSON compatible then use JSON encoder" logic. However, I'm questioning a bit if this makes sense or if the test should just be removed?

Why would you have JSON in a .yaml file and expect it to be formatted like JSON? Is that worth encoding most nodes twice for? And like I've described higher up, this check is also flaky, if you have a long string value it's going to fail.

I agree, I don't see a real use case for that scenario. I commented the deleted test case was really testing unformatted-json-to-formatted, we can introduce a unit test-case to cover that scenario, then we should be good. WDYT ?

Copy link
Contributor

@monopole monopole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

// YAML flow style encoding may not be compatible because of unquoted strings and newlines
// introduced by the YAML marshaller in long string values. These newlines are insignificant
// when interpreted as YAML but invalid when interpreted as JSON.
jsonEncodeSingleNode := false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment should explain the rationale behind the node slice length == 1 check

So if we have 3 nodes, and the middle one came from a json file, while the other 2 did not,
we don't json encode? comment should explain

Also, please throw this code and comment into a func, e.g.

forceJsonEncoding := shouldForceJsonEncoding(nodes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but I'll name the func and var slightly differently to make it clear why it's fine to assume that the node slice has a single element when special casing further down.

"sigs.k8s.io/kustomize/kyaml/errors"
"sigs.k8s.io/kustomize/kyaml/kio/kioutil"
"sigs.k8s.io/kustomize/kyaml/yaml"
)

// Writer writes ResourceNodes to bytes.
// ByteWriter writes ResourceNodes to bytes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to have the JSON vs YAML behavior (as a function of file name annotations) described up here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you touched the line :)

thanks BTW

@monopole
Copy link
Contributor

monopole commented Jun 3, 2021

@natasha41575 I can get that test to pass by adding back the "if the YAML encoding of the node is JSON compatible then use JSON encoder" logic. However, I'm questioning a bit if this makes sense or if the test should just be removed?
Why would you have JSON in a .yaml file and expect it to be formatted like JSON? Is that worth encoding most nodes twice for? And like I've described higher up, this check is also flaky, if you have a long string value it's going to fail.

I agree, I don't see a real use case for that scenario. I commented the deleted test case was really testing unformatted-json-to-formatted, we can introduce a unit test-case to cover that scenario, then we should be good. WDYT ?

if i'm reading it correctly, the comment in the test was wrong.

what the test was actually doing was confirming that FormatFileOrDirectory didn't care that the file had a .yaml extension.

If the file contained json on input, it would contain (formatted) json on output.

which is a useful thing to have regression coverage for.

@yhrn
Copy link
Contributor Author

yhrn commented Jun 3, 2021

@natasha41575 I can get that test to pass by adding back the "if the YAML encoding of the node is JSON compatible then use JSON encoder" logic. However, I'm questioning a bit if this makes sense or if the test should just be removed?
Why would you have JSON in a .yaml file and expect it to be formatted like JSON? Is that worth encoding most nodes twice for? And like I've described higher up, this check is also flaky, if you have a long string value it's going to fail.

I agree, I don't see a real use case for that scenario. I commented the deleted test case was really testing unformatted-json-to-formatted, we can introduce a unit test-case to cover that scenario, then we should be good. WDYT ?

if i'm reading it correctly, the comment in the test was wrong.

what the test was actually doing was confirming that FormatFileOrDirectory didn't care that the file had a .yaml extension.

If the file contained json on input, it would contain (formatted) json on output.

which is a useful thing to have regression coverage for.

I have a hard time understanding if the comment were wrong or just confusing. FormatFileOrDirectory completely ignores .json files, I observed this by changing the extension in the test, so it's not about that.

The reason I felt that deleting the test was the right option is that these tests verify specific formatting and in the particular case of dropping a full JSON document into a .yaml file the formatting will not turn out to be very good looking so the test would be of the annoying, unmaintainable kind that is testing for something much more specific than what you really care about. Full flow style YAML won't be formatted nicely because the YAML encoder settings seem to be geared towards mostly block style with potentially some flow style scalar arrays and maps.

Put another way, accepting the changes to kyaml/kio/byteio_writer.go as they are right now means accepting that we no longer try to make full flow style YAML look nice, because the inconsistency drawbacks of the current approach are not worth it, so then we shouldn't be testing against a specific style that we don't really care about.

But if you feel strongly the other way then I'll add the test back with amended comments and changed expected text to the not so pretty flow style formatted YAML.

@monopole
Copy link
Contributor

monopole commented Jun 3, 2021

I have a hard time understanding if the comment were wrong or just confusing. FormatFileOrDirectory completely ignores .json files, I observed this by changing the extension in the test, so it's not about that.

forehead slap. yes, comments were way off.

Please add a test proving that (write json to .json file, call format, read json and show nothing changed).

Tests in this repo as are much about covering odd behavior and bugs (with a comment saying so) as they are about showing correct behavior. So if something gets fixed, the test must change, confirming the fix is real.

The reason I felt that deleting the test was the right option is that these tests verify specific formatting and in the particular case of dropping a full JSON document into a .yaml file the formatting will not turn out to be very good looking so the test would be of the annoying, unmaintainable kind

That's certainly the correct sentiment in general, except in this repo we must be sensitive about specific YAML and JSON formatting and have loads of tests that do straight up string compares of formatted text so we know when we might break someone downstream and why.

@yhrn
Copy link
Contributor Author

yhrn commented Jun 4, 2021

I believe I have addressed all comments now - I added the requested tests and comments and moved the "should JSON encode?" logic to a separate func.

Copy link
Contributor

@monopole monopole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 4, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: monopole, yhrn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 4, 2021
@k8s-ci-robot k8s-ci-robot merged commit 5993eae into kubernetes-sigs:master Jun 4, 2021
@yhrn yhrn deleted the fix-json-sink branch June 7, 2021 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants