-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stabilize the JSON format #101
Comments
Thanks for the request @munntjlx. Yes, this is a worthy task. See #72 for related. I have tried with mixed success up to this point to only add fields to the JSON format. With a lenient parser — one that ignores unknown additional fields — the core information should still be parseable in the presence of additions, without having to change the parser. It may take another release or three, but this is in my plans. (Relatedly, I'd like to also stabilize the SQL schema for the datastore, but that will likely take longer.) |
P.S. @munntjlx out of curiosity, are your defect dojo parser scripts proprietary, or are they part of the DefectDojo project? |
THey will eventually be part of the main project. Tanvi is working on these which we hope to mainline once they get around to importing the parser.
…________________________________________
From: Brad Larsen ***@***.***>
Sent: Monday, 11 December 2023 15:23
To: praetorian-inc/noseyparker
Cc: Munn, Thomas (LNG-RDU); Mention
Subject: Re: [praetorian-inc/noseyparker] Stabilize the JSON format (Issue #101)
*** External email: use caution ***
P.S. @munntjlx<https://github.com/munntjlx> out of curiosity, are your defect dojo parser scripts proprietary, or are they part of the DefectDojo project?
—
Reply to this email directly, view it on GitHub<#101 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ALVIUB34Y7IJJORMBAIYIMLYI5TSJAVCNFSM6AAAAABAQJLZRWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJQHAZDSNZVGI>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Anyway, aside from trying hard to not change the JSON output between releases, the place to start with stabilization is to write a schema and validate against that (at least in testing): #72. |
@munntjlx there are more changes to the JSON report coming in the next release (#122). Some fundamentals have changed in the Nosey Parker data model that require some visible changes to the JSON report format. I'm hoping that these changes are the last major ones, and that future format changes will be infrequent and involve only adding new fields. I'm also hoping to put together a JSON Schema definition of the report format for the next release. This should help both with documentation of the format and also help with identifying changes of that format. |
This is a big PR that makes a number of significant changes to the Nosey Parker data model. - The minimum supported Rust version has been changed from 1.70 to 1.76. - The data model and datastore have been significantly overhauled: - The rules used during scanning are now explicitly recorded in the datastore. Each rule is additionally accompanied by a content-based identifier that uniquely identifies the rule based on its pattern. - Each match is now associated with the rule that produced it, rather than just the rule's name (which can change as rules are modified). - Each match is now assigned a unique content-based identifier. - Findings (i.e., groups of matches with the same capture groups, produced by the same rule) are now represented explicitly in the datastore. Each finding is assigned a unique content-based identifier. - Now, each time a rule matches, a single match object is produced. Each match in the datastore is now associated with an array of capture groups. Previously, a rule whose pattern had multiple capture groups would produce one match object for each group, with each one being associated with a single capture group. - Provenance metadata for blobs is recorded in a much simpler way than before. The new representation explicitly records file and git-based provenance, but also adds explicit support for _extensible_ provenance. This change will make it possible in the future to have Nosey Parker scan and usefully report blobs produced by custom input data enumerators (e.g., a Python script that lists files from the Common Crawl WARC files). - Scores are now associated with matches instead of findings. - Comments can now be associated with both matches and findings, instead of just findings. - The JSON and JSONL report formats have changed. These will stabilize in a future release ([#101](#101)). - The `matching_input` field for matches has been removed and replaced with a new `groups` field, which contains an array of base64-encoded bytestrings. - Each match now includes additional `rule_text_id`, `rule_structural_id`, and `structural_id` fields. - The `provenance` field of each match is now slightly different. - Schema migration of older Nosey Parker datastores is no longer performed. Previously, this would automatically and silently be done when opening a datastore from an older version. Explicit support for datastore migration may be added back in a future release.
Thanks for the heads up. We have been standardizing on 'specific' versions to help reduce the workload associated with schma changes. A json schema would go a long way for helping us to 'stabilize' final version formats.
Thomas
…________________________________________
From: Brad Larsen ***@***.***>
Sent: Friday, 16 February 2024 18:43
To: praetorian-inc/noseyparker
Cc: Munn, Thomas (LNG-RDU); Mention
Subject: Re: [praetorian-inc/noseyparker] Stabilize the JSON format (Issue #101)
*** External email: use caution ***
@munntjlx<https://github.com/munntjlx> there are more changes to the JSON report coming in the next release (#122<#122>). Some fundamentals have changed in the Nosey Parker data model that require some visible changes to the JSON report format.
I'm hoping that these changes are the last major ones, and that future format changes will be infrequent and involve only adding new fields.
I'm also hoping to put together a JSON Schema definition of the report format for the next release. This should help both with documentation of the format and also help with identifying changes of that format.
—
Reply to this email directly, view it on GitHub<#101 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ALVIUB5GML6JQKGXZ7WNCBLYT7VJ7AVCNFSM6AAAAABAQJLZRWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBZGQ4TCMZZGQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Possible changes to the JSON format will become less frequent as Nosey Parker becomes more mature. Though I'm not going to promise not to change the JSON format at this point, I will provide an updated JSON schema and an announcement in the release notes when this does happen. The JSON schemas between versions could be diffed to understand what has changed. (You can get the JSON schema from the v0.17.0 releases, or using the new I don't have any changes that I want to make to the JSON format at the moment. I suspect that future modifications to it would be in the form of additional data rather than renaming fields or changing its organization. |
Our Noseyparker decoder has been added to Defect Dojo Main Project. Probably need to get it beyond v0.16 |
Additions would be preferable!
…________________________________________
From: Brad Larsen ***@***.***>
Sent: Thursday, 7 March 2024 16:57
To: praetorian-inc/noseyparker
Cc: Munn, Thomas (LNG-RDU); Mention
Subject: Re: [praetorian-inc/noseyparker] Stabilize the JSON format (Issue #101)
*** External email: use caution ***
Describe the solution you'd like
Can we stabilize the json format or agree to only change it on x versions? This would make derivitave works or programs having a 'stable' base upon which to build.
Possible changes to the JSON format will become less frequent as Nosey Parker becomes more mature.
Though I'm not going to promise not to change the JSON format at this point, I will provide an updated JSON schema and an announcement in the release notes when this does happen. The JSON schemas between versions could be diffed to understand what has changed. (You can get the JSON schema from the v0.17.0 releases<https://github.com/praetorian-inc/noseyparker/releases/v0.17.0>, or using the new noseyparker generate json-schema command.)
I don't have any changes that I want to make to the JSON format at the moment. I suspect that future modifications to it would be in the form of additional data rather than renaming fields or changing its organization.
—
Reply to this email directly, view it on GitHub<#101 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ALVIUBYHESE7Y6ZZA24G7ILYXDPD3AVCNFSM6AAAAABAQJLZRWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBUGU3TANJUGM>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Is your feature request related to a problem? Please describe.
We are using noseyparker as part of a 'defect dojo' secret tracking and discovery process. We use the jsonl format, but it seems to change arbitrarily between even minor versions. This causes us to have to modify our parser scripts (for defect dojo) since our unit tests fail when the JSON format changes.
Describe the solution you'd like
Can we stabilize the json format or agree to only change it on x versions? This would make derivitave works or programs having a 'stable' base upon which to build.
Describe alternatives you've considered
Perhaps 'we change the json format every .2 or .4 increments of a new version? Or an 'odd vs 'even' which gives us some stability on the JSON format?
Additional context
Just to note this is a GREAT project and we really appreciate the work you are doing!
The text was updated successfully, but these errors were encountered: